Studying Context: A Comparison of Activity Theory, Situated Action Models, and Distributed Cognition

THE DESIGN OF EVERYDAY THINGS

THE PSYCHOPATHOLOGY OF EVERYDAY THINGS

If I were placed in the cockpit of a modern jet airliner, my inability to perform well would neither surprise nor bother me. But why should I have trouble with doors and light switches, water faucets and stoves? “Doors?” I can hear the reader saying. “You have trouble opening doors?” Yes. I push doors that are meant to be pulled, pull doors that should be pushed, and walk into doors that neither pull nor push, but slide. Moreover, I see others having the same troubles—unnecessary troubles. My problems with doors have become so well known that confusing doors are often called “Norman doors.” Imagine becoming famous for doors that don’t work right. I’m pretty sure that’s not what my parents planned for me. (Put “Norman doors” into your favorite search engine—be sure to include the quote marks: it makes for fascinating reading.)

How can such a simple thing as a door be so confusing? A door would seem to be about as simple a device as possible. There is not much you can do to a door: you can open it or shut it. Suppose you are in an office building, walking down a corridor. You come to a door. How does it open? Should you push or pull, on the left or the right? Maybe the door slides. If so, in which direction? I have seen doors that slide to the left, to the right, and even up into the ceiling.

Coffeepot for Masochists. The French artist Jacques Carelman in his series of books Catalogue d’objets introuvables (Catalog of unfindable objects) provides delightful examples of everyday things that are deliberately unwork able, outrageous, or otherwise ill-formed. One of my favorite items is what he calls “coffeepot for masochists.” The photograph shows a copy given to me by collegues at the University of California, San Diego. It is one of my treasured art objects.

FIGURE . 1.1

(Photograph by Aymin Shamma for the author.)

The design of the door should indicate how to work it without any need for signs, certainly without any need for trial and error. A friend told me of the time he got trapped in the doorway of a post office in a European city. The entrance was an imposing row of six glass swinging doors, followed immediately by a second, identical row. That’s a standard design: it helps reduce the airflow and thus maintain the indoor temperature of the building. There was no visible hardware: obviously the doors could swing in ei ther direction: all a person had to do was push the side of the door and enter.

My friend pushed on one of the outer doors. It swung inward, and he entered the building. Then, before he could get to the next row of doors, he was distracted and turned around for an instant. He didn’t realize it at the time, but he had moved slightly to the right. So when he came to the next door and pushed it, nothing happened. “Hmm,” he thought, “must be locked.” So he pushed the side of the adjacent door. Nothing. Puzzled, my friend decided to go outside again. He turned around and pushed against the side of a door. Nothing. He pushed the adjacent door. Nothing. The door he had just entered no longer worked. He turned around once more and tried the inside doors again. Nothing. Concern, then mild panic. He was trapped! Just then, a group of people on the other side of the entranceway (to my friend’s right) passed eas ily through both sets of doors. My friend hurried over to follow their path.

How could such a thing happen? A swinging door has two sides. One contains the supporting pillar and the hinge, the other is un supported. To open the door, you must push or pull on the unsup ported edge. If you push on the hinge side, nothing happens. In my friend’s case, he was in a building where the designer aimed for beauty, not utility. No distracting lines, no visible pillars, no vis ible hinges. So how can the ordinary user know which side to push on? While distracted, my friend had moved toward the (invisible) supporting pillar, so he was pushing the doors on the hinged side. No wonder nothing happened. Attractive doors. Stylish. Probably won a design prize.

Two of the most important characteristics of good design are dis coverability and understanding. Discoverability: Is it possible to even figure out what actions are possible and where and how to per form them? Understanding: What does it all mean? How is the product supposed to be used? What do all the different controls and settings mean?

The doors in the story illustrate what happens when discoverabil ity fails. Whether the device is a door or a stove, a mobile phone or a nuclear power plant, the relevant components must be visible, and they must communicate the correct message: What actions are possible? Where and how should they be done? With doors that push, the designer must provide signals that naturally indi cate where to push. These need not destroy the aesthetics. Put a vertical plate on the side to be pushed. Or make the supporting pillars visible. The vertical plate and supporting pillars are natural signals, naturally interpreted, making it easy to know just what to do: no labels needed.

With complex devices, discoverability and understanding re quire the aid of manuals or personal instruction. We accept this if the device is indeed complex, but it should be unnecessary for simple things. Many products defy understanding simply because they have too many functions and controls. I don’t think that sim ple home appliances—stoves, washing machines, audio and tele vision sets—should look like Hollywood’s idea of a spaceship control room. They already do, much to our consternation. Faced with a bewildering array of controls and displays, we simply mem orize one or two fixed settings to approximate what is desired. In England I visited a home with a fancy new Italian washer dryer combination, with super-duper multisymbol controls, all to do everything anyone could imagine doing with the washing and drying of clothes. The husband (an engineering psychologist) said he refused to go near it. The wife (a physician) said she had simply memorized one setting and tried to ignore the rest. I asked to see the manual: it was just as confusing as the device. The whole pur pose of the design is lost.

The Complexity of Modern Devices

All artificial things are designed. Whether it is the layout of fur niture in a room, the paths through a garden or forest, or the in tricacies of an electronic device, some person or group of people had to decide upon the layout, operation, and mechanisms. Not all designed things involve physical structures. Services, lectures, rules and procedures, and the organizational structures of busi nesses and governments do not have physical mechanisms, but their rules of operation have to be designed, sometimes informally, sometimes precisely recorded and specified.

But even though people have designed things since prehistoric times, the field of design is relatively new, divided into many areas of specialty. Because everything is designed, the number of areas is enormous, ranging from clothes and furniture to complex control rooms and bridges. This book covers everyday things, focusing on the interplay between technology and people to ensure that the products actually fulfill human needs while being understand able and usable. In the best of cases, the products should also be delightful and enjoyable, which means that not only must the re quirements of engineering, manufacturing, and ergonomics be sat isfied, but attention must be paid to the entire experience, which means the aesthetics of form and the quality of interaction. The major areas of design relevant to this book are industrial design, interaction design, and experience design. None of the fields is well defined, but the focus of the efforts does vary, with industrial designers emphasizing form and material, interactive designers emphasizing understandability and usability, and experience de signers emphasizing the emotional impact. Thus:

Industrial design: The professional service of creating and developing concepts and specifications that optimize the function, value, and appearance of products and systems for the mutual benefit of both user and manufacturer (from the Industrial Design Society of America’s website). Interaction design: The focus is upon how people interact with tech nology. The goal is to enhance people’s understanding of what can be done, what is happening, and what has just occurred. Interaction de sign draws upon principles of psychology, design, art, and emotion to ensure a positive, enjoyable experience. Experience design: The practice of designing products, processes, ser vices, events, and environments with a focus placed on the quality and enjoyment of the total experience.

Design is concerned with how things work, how they are con trolled, and the nature of the interaction between people and technology. When done well, the results are brilliant, pleasurable products. When done badly, the products are unusable, leading to great frustration and irritation. Or they might be usable, but force us to behave the way the product wishes rather than as we wish. Machines, after all, are conceived, designed, and constructed by people. By human standards, machines are pretty limited. They do not maintain the same kind of rich history of experiences that people have in common with one another, experiences that enable us to interact with others because of this shared understanding. Instead, machines usually follow rather simple, rigid rules of be havior. If we get the rules wrong even slightly, the machine does what it is told, no matter how insensible and illogical. People are imaginative and creative, filled with common sense; that is, a lot of valuable knowledge built up over years of experience. But instead of capitalizing on these strengths, machines require us to be precise and accurate, things we are not very good at. Machines have no leeway or common sense. Moreover, many of the rules followed by a machine are known only by the machine and its designers.

When people fail to follow these bizarre, secret rules, and the machine does the wrong thing, its operators are blamed for not understanding the machine, for not following its rigid specifica tions. With everyday objects, the result is frustration. With complex devices and commercial and industrial processes, the resulting difficulties can lead to accidents, injuries, and even deaths. It is time to reverse the situation: to cast the blame upon the machines and their design. It is the machine and its design that are at fault. It is the duty of machines and those who design them to understand people. It is not our duty to understand the arbitrary, meaningless dictates of machines.

The reasons for the deficiencies in human-machine interaction are numerous. Some come from the limitations of today’s technol ogy. Some come from self-imposed restrictions by the designers, often to hold down cost. But most of the problems come from a complete lack of understanding of the design principles necessary for effective human-machine interaction. Why this deficiency? Be cause much of the design is done by engineers who are experts in technology but limited in their understanding of people. “We are people ourselves,” they think, “so we understand people.” But in fact, we humans are amazingly complex. Those who have not studied human behavior often think it is pretty simple. Engineers, moreover, make the mistake of thinking that logical explanation is sufficient: “If only people would read the instructions,” they say, “everything would be all right.”

Engineers are trained to think logically. As a result, they come to believe that all people must think this way, and they design their machines accordingly. When people have trouble, the engineers are upset, but often for the wrong reason. “What are these people doing?” they will wonder. “Why are they doing that?” The prob lem with the designs of most engineers is that they are too logical. We have to accept human behavior the way it is, not the way we would wish it to be.

I used to be an engineer, focused upon technical requirements, quite ignorant of people. Even after I switched into psychology and cognitive science, I still maintained my engineering emphasis upon logic and mechanism. It took a long time for me to realize that my understanding of human behavior was relevant to my in terest in the design of technology. As I watched people struggle with technology, it became clear that the difficulties were caused by the technology, not the people.

I was called upon to help analyze the American nuclear power plant accident at Three Mile Island (the island name comes from the fact that it is located on a river, three miles south of Middle town in the state of Pennsylvania). In this incident, a rather simple mechanical failure was misdiagnosed. This led to several days of difficulties and confusion, total destruction of the reactor, and a very close call to a severe radiation release, all of which brought the American nuclear power industry to a complete halt. The op erators were blamed for these failures: “human error” was the im mediate analysis. But the committee I was on discovered that the plant’s control rooms were so poorly designed that error was inevi table: design was at fault, not the operators. The moral was simple: we were designing things for people, so we needed to understand both technology and people. But that’s a difficult step for many engineers: machines are so logical, so orderly. If we didn’t have people, everything would work so much better. Yup, that’s how I used to think.

My work with that committee changed my view of design. To day, I realize that design presents a fascinating interplay of tech nology and psychology, that the designers must understand both. Engineers still tend to believe in logic. They often explain to me in great, logical detail, why their designs are good, powerful, and wonderful. “Why are people having problems?” they wonder. “You are being too logical,” I say. “You are designing for people the way you would like them to be, not for the way they really are.” When the engineers object, I ask whether they have ever made an error, perhaps turning on or off the wrong light, or the wrong stove burner. “Oh yes,” they say, “but those were errors.” That’s the point: even experts make errors. So we must design our ma chines on the assumption that people will make errors. (Chapter 5 provides a detailed analysis of human error.)

Human-Centered Design

People are frustrated with everyday things. From the ever-increasing complexity of the automobile dashboard, to the increasing auto mation in the home with its internal networks, complex music, video, and game systems for entertainment and communication, and the increasing automation in the kitchen, everyday life some times seems like a never-ending fight against confusion, continued errors, frustration, and a continual cycle of updating and maintain ing our belongings.

In the multiple decades that have elapsed since the first edition of this book was published, design has gotten better. There are now many books and courses on the topic. But even though much has improved, the rapid rate of technology change outpaces the ad vances in design. New technologies, new applications, and new methods of interaction are continually arising and evolving. New industries spring up. Each new development seems to repeat the mistakes of the earlier ones; each new field requires time before it, too, adopts the principles of good design. And each new inven tion of technology or interaction technique requires experimenta tion and study before the principles of good design can be fully integrated into practice. So, yes, things are getting better, but as a result, the challenges are ever present.

The solution is human-centered design (HCD), an approach that puts human needs, capabilities, and behavior first, then de signs to accommodate those needs, capabilities, and ways of be having. Good design starts with an understanding of psychology and technology. Good design requires good communication, espe cially from machine to person, indicating what actions are possible, what is happening, and what is about to happen. Communica tion is especially important when things go wrong. It is relatively easy to design things that work smoothly and harmoniously as

The Role of HCD and Design Specializations TABLE 1.1.

Experience design

These are areas of focus

Industrial design

Interaction design

Human-centered design

The process that ensures that the designs match the needs and capa- bilities of the people for whom they are intended

long as things go right. But as soon as there is a problem or a mis understanding, the problems arise. This is where good design is essential. Designers need to focus their attention on the cases where things go wrong, not just on when things work as planned. Actually, this is where the most satisfaction can arise: when some thing goes wrong but the machine highlights the problems, then the person understands the issue, takes the proper actions, and the problem is solved. When this happens smoothly, the collaboration of person and device feels wonderful.

Human-centered design is a design philosophy. It means start ing with a good understanding of people and the needs that the design is intended to meet. This understanding comes about pri marily through observation, for people themselves are often un aware of their true needs, even unaware of the difficulties they are encountering. Getting the specification of the thing to be defined is one of the most difficult parts of the design, so much so that the HCD principle is to avoid specifying the problem as long as pos sible but instead to iterate upon repeated approximations. This is done through rapid tests of ideas, and after each test modifying the approach and the problem definition. The results can be products that truly meet the needs of people. Doing HCD within the rigid time, budget, and other constraints of industry can be a challenge: Chapter 6 examines these issues.

Where does HCD fit into the earlier discussion of the several dif ferent forms of design, especially the areas called industrial, inter action, and experience design? These are all compatible. HCD is a philosophy and a set of procedures, whereas the others are areas of focus (see Table 1.1). The philosophy and procedures of HCD add deep consideration and study of human needs to the design pro cess, whatever the product or service, whatever the major focus.

Fundamental Principles of Interaction

Great designers produce pleasurable experiences. Experience: note the word. Engineers tend not to like it; it is too subjective. But when I ask them about their favorite automobile or test equipment, they will smile delightedly as they discuss the fit and finish, the sensa tion of power during acceleration, their ease of control while shift ing or steering, or the wonderful feel of the knobs and switches on the instrument. Those are experiences.

Experience is critical, for it determines how fondly people re member their interactions. Was the overall experience positive, or was it frustrating and confusing? When our home technology be haves in an uninterpretable fashion we can become confused, frus trated, and even angry—all strong negative emotions. When there is understanding it can lead to a feeling of control, of mastery, and of satisfaction or even pride—all strong positive emotions. Cog nition and emotion are tightly intertwined, which means that the designers must design with both in mind.

When we interact with a product, we need to figure out how to work it. This means discovering what it does, how it works, and what operations are possible: discoverability. Discoverability re sults from appropriate application of five fundamental psycholog ical concepts covered in the next few chapters: affordances, signifiers, constraints, mappings, and feedback. But there is a sixth principle, perhaps most important of all: the conceptual model of the system. It is the conceptual model that provides true understanding. So I now turn to these fundamental principles, starting with affor dances, signifiers, mappings, and feedback, then moving to con ceptual models. Constraints are covered in Chapters 3 and 4.

AFFORDANCES

We live in a world filled with objects, many natural, the rest artifi cial. Every day we encounter thousands of objects, many of them new to us. Many of the new objects are similar to ones we already know, but many are unique, yet we manage quite well. How do we do this? Why is it that when we encounter many unusual natural objects, we know how to interact with them? Why is this true with many of the artificial, human-made objects we encounter? The an swer lies with a few basic principles. Some of the most important of these principles come from a consideration of affordances. The term affordance refers to the relationship between a physi cal object and a person (or for that matter, any interacting agent, whether animal or human, or even machines and robots). An affor dance is a relationship between the properties of an object and the capabilities of the agent that determine just how the object could possibly be used. A chair affords (“is for”) support and, therefore, affords sitting. Most chairs can also be carried by a single per son (they afford lifting), but some can only be lifted by a strong person or by a team of people. If young or relatively weak people cannot lift a chair, then for these people, the chair does not have that affordance, it does not afford lifting.

The presence of an affordance is jointly determined by the qual ities of the object and the abilities of the agent that is interacting. This relational definition of affordance gives considerable difficulty to many people. We are used to thinking that properties are asso ciated with objects. But affordance is not a property. An affordance is a relationship. Whether an affordance exists depends upon the properties of both the object and the agent.

Glass affords transparency. At the same time, its physical struc ture blocks the passage of most physical objects. As a result, glass affords seeing through and support, but not the passage of air or most physical objects (atomic particles can pass through glass). The blockage of passage can be considered an anti-affordance—the prevention of interaction. To be effective, affordances and anti affordances have to be discoverable—perceivable. This poses a difficulty with glass. The reason we like glass is its relative invis ibility, but this aspect, so useful in the normal window, also hides its anti-affordance property of blocking passage. As a result, birds often try to fly through windows. And every year, numerous peo ple injure themselves when they walk (or run) through closed glass doors or large picture windows. If an affordance or anti-affordance cannot be perceived, some means of signaling its presence is re quired: I call this property a signifier (discussed in the next section). The notion of affordance and the insights it provides originated with J. J. Gibson, an eminent psychologist who provided many advances to our understanding of human perception. I had in teracted with him over many years, sometimes in formal confer ences and seminars, but most fruitfully over many bottles of beer, late at night, just talking. We disagreed about almost everything. I was an engineer who became a cognitive psychologist, trying to understand how the mind works. He started off as a Gestalt psy chologist, but then developed an approach that is today named after him: Gibsonian psychology, an ecological approach to percep tion. He argued that the world contained the clues and that people simply picked them up through “direct perception.” I argued that nothing could be direct: the brain had to process the information arriving at the sense organs to put together a coherent interpreta tion. “Nonsense,” he loudly proclaimed; “it requires no interpreta tion: it is directly perceived.” And then he would put his hand to his ears, and with a triumphant flourish, turn off his hearing aids: my counterarguments would fall upon deaf ears—literally. When I pondered my question—how do people know how to act when confronted with a novel situation—I realized that a large part of the answer lay in Gibson’s work. He pointed out that all the senses work together, that we pick up information about the world by the combined result of all of them. “Information pickup” was one of his favorite phrases, and Gibson believed that the combined in formation picked up by all of our sensory apparatus—sight, sound, smell, touch, balance, kinesthetic, acceleration, body position— determines our perceptions without the need for internal pro cessing or cognition. Although he and I disagreed about the role played by the brain’s internal processing, his brilliance was in fo cusing attention on the rich amount of information present in the world. Moreover, the physical objects conveyed important infor mation about how people could interact with them, a property he named “affordance.”

Affordances exist even if they are not visible. For designers, their visibility is critical: visible affordances provide strong clues to the operations of things. A flat plate mounted on a door affords push ing. Knobs afford turning, pushing, and pulling. Slots are for in serting things into. Balls are for throwing or bouncing. Perceived affordances help people figure out what actions are possible with out the need for labels or instructions. I call the signaling compo nent of affordances signifiers.

SIGNIFIERS

Are affordances important to designers? The first edition of this book introduced the term affordances to the world of design. The design community loved the concept and affordances soon prop agated into the instruction and writing about design. I soon found mention of the term everywhere. Alas, the term became used in ways that had nothing to do with the original.

Many people find affordances difficult to understand because they are relationships, not properties. Designers deal with fixed properties, so there is a temptation to say that the property is an affordance. But that is not the only problem with the concept of affordances.

Designers have practical problems. They need to know how to design things to make them understandable. They soon discov ered that when working with the graphical designs for electronic displays, they needed a way to designate which parts could be touched, slid upward, downward, or sideways, or tapped upon. The actions could be done with a mouse, stylus, or fingers. Some systems responded to body motions, gestures, and spoken words, with no touching of any physical device. How could designers de scribe what they were doing? There was no word that fit, so they took the closest existing word—affordance. Soon designers were saying such things as, “I put an affordance there,” to describe why they displayed a circle on a screen to indicate where the person should touch, whether by mouse or by finger. “No,” I said, “that is not an affordance. That is a way of communicating where the touch should be. You are communicating where to do the touching: the affordance of touching exists on the entire screen: you are trying to signify where the touch should take place. That’s not the same thing as saying what action is possible.”

Not only did my explanation fail to satisfy the design commu nity, but I myself was unhappy. Eventually I gave up: designers needed a word to describe what they were doing, so they chose affordance. What alternative did they have? I decided to provide a better answer: signifiers. Affordances determine what actions are possible. Signifiers communicate where the action should take place. We need both.

People need some way of understanding the product or service they wish to use, some sign of what it is for, what is happening, and what the alternative actions are. People search for clues, for any sign that might help them cope and understand. It is the sign that is important, anything that might signify meaningful informa tion. Designers need to provide these clues. What people need, and what designers must provide, are signifiers. Good design requires, among other things, good communication of the purpose, struc ture, and operation of the device to the people who use it. That is the role of the signifier.

The term signifier has had a long and illustrious career in the ex otic field of semiotics, the study of signs and symbols. But just as I appropriated affordance to use in design in a manner somewhat different than its inventor had intended, I use signifier in a some what different way than it is used in semiotics. For me, the term signifier refers to any mark or sound, any perceivable indicator that communicates appropriate behavior to a person.

Signifiers can be deliberate and intentional, such as the sign push on a door, but they may also be accidental and unintentional, such as our use of the visible trail made by previous people walk ing through a field or over a snow-covered terrain to determine the best path. Or how we might use the presence or absence of people waiting at a train station to determine whether we have missed the train. (I explain these ideas in more detail in my book

Living with Complexity.)

A.

B.

C.

Problem Doors: Signifiers Are Needed. Door hardware can signal whether to push or pull without signs, but the hardware of the two doors in the upper photo, A, are identical even though one should be pushed, the other pulled. The flat, ribbed horizontal bar has the obvious perceived affordance of pushing, but as the signs indicate, the door on the left is to be pulled, the one on the right is to be pushed. In the bottom pair of photos, B and C, there are no visible signifiers or affordances. How does one know which side to push? Trial and error. When external signifiers—signs— have to be added to something as simple as a door, it indicates bad design.

FIGURE . 1.2

(Photographs by the author.)

The signifier is an important communication device to the recipi ent, whether or not communication was intended. It doesn’t matter whether the useful signal was deliberately placed or whether it is incidental: there is no necessary distinction. Why should it matter whether a flag was placed as a deliberate clue to wind direction (as is done at airports or on the masts of sailboats) or was there as an advertisement or symbol of pride in one’s country (as is done on public buildings). Once I interpret a flag’s motion to indicate wind direction, it does not matter why it was placed there. Consider a bookmark, a deliberately placed signifier of one’s place in reading a book. But the physical nature of books also makes a bookmark an accidental signifier, for its placement also indicates how much of the book remains. Most readers have learned to use this accidental signifier to aid in their enjoyment of the reading. With few pages left, we know the end is near. And if the reading is torturous, as in a school assignment, one can always console one self by knowing there are “only a few more pages to get through.” Electronic book readers do not have the physical structure of paper books, so unless the software designer deliberately provides a clue, they do not convey any signal about the amount of text remaining.

A.

C.

Sliding Doors: Seldom Done Well. Sliding doors are seldom signified properly. The top two photographs show the sliding door to the toilet on an Amtrak train in the United States. The handle clearly signifies “pull,” but in fact, it needs to be rotated and the door slid to the right. The owner of the store in Shanghai, China, Photo C, solved the problem with a sign. “don’t push!” it says, in both English and Chinese. Amtrak’s toilet door could have used a similar kind of sign. (Photographs by the author.)

FIGURE . 1.3

Whatever their nature, planned or accidental, signifiers provide valuable clues as to the nature of the world and of social activities. For us to function in this social, technological world, we need to develop internal models of what things mean, of how they operate. We seek all the clues we can find to help in this enterprise, and in this way, we are detectives, searching for whatever guidance we might find. If we are fortunate, thoughtful designers provide the clues for us. Otherwise, we must use our own creativity and imagination.

A.

C.

FIGURE . 1.4

B.

D.

The Sink That Would Not Drain: Where Signifiers Fail. I washed my

hands in my hotel sink in London, but then, as shown in Photo A, was left with the question of how to empty the sink of the dirty water. I searched all over for a control: none. I tried prying open the sink stopper with a spoon (Photo B): failure. I finally left my hotel room and went to the front desk to ask for instructions. (Yes, I actually did.) “Push down on the stopper,” I was told. Yes, it worked (Photos C and D). But how was anyone to ever discover this? And why should I have to put my clean hands back into the dirty water to empty the sink? The problem here is not just the lack of signifier, it is the faulty decision to produce a stopper that requires people to dirty their clean hands

to use it. (Photographs by the author.)

Affordances, perceived affordances, and signifiers have much in common, so let me pause to ensure that the distinctions are clear. Affordances represent the possibilities in the world for how an agent (a person, animal, or machine) can interact with something. Some affordances are perceivable, others are invisible. Signifiers are signals. Some signifiers are signs, labels, and drawings placed in the world, such as the signs labeled “push,” “pull,” or “exit” on doors, or arrows and diagrams indicating what is to be acted upon or in which direction to gesture, or other instructions. Some signifiers are simply the perceived affordances, such as the han dle of a door or the physical structure of a switch. Note that some perceived affordances may not be real: they may look like doors or places to push, or an impediment to entry, when in fact they are not. These are misleading signifiers, oftentimes accidental but sometimes purposeful, as when trying to keep people from doing actions for which they are not qualified, or in games, where one of the challenges is to figure out what is real and what is not.

FIGURE . 1.5

Accidental Affordances Can Become Strong Signifiers. This

wall, at the Industrial Design department of KAIST, in Korea, provides an anti affordance, preventing people from falling down the stair shaft. Its top is flat, an ac cidental by-product of the design. But flat surfaces afford support, and as soon as one person discovers it can be used to dispose of empty drink containers, the discarded container becomes a signifier, telling others that it is permissible to discard their items

there. (Photographs by the author.)

B.

A.

C.

My favorite example of a misleading signifier is a row of ver tical pipes across a service road that I once saw in a public park. The pipes obviously blocked cars and trucks from driving on that road: they were good examples of anti-affordances. But to my great surprise, I saw a park vehicle simply go through the pipes. Huh? I walked over and examined them: the pipes were made of rubber, so vehicles could simply drive right over them. A very clever sig nifier, signaling a blocked road (via an apparent anti-affordance) to the average person, but permitting passage for those who knew.

To summarize:

In design, signifiers are more important than affordances, for they communicate how to use the design. A signifier can be words, a graphical illustration, or just a device whose perceived affor dances are unambiguous. Creative designers incorporate the sig nifying part of the design into a cohesive experience. For the most part, designers can focus upon signifiers.

Because affordances and signifiers are fundamentally important principles of good design, they show up frequently in the pages of this book. Whenever you see hand-lettered signs pasted on doors, switches, or products, trying to explain how to work them, what to do and what not to do, you are also looking at poor design.

A F F OR DA NC E S A N D SIGN I F I E R S : A C ON V E R SAT ION

A designer approaches his mentor. He is working on a system that recommends restaurants to people, based upon their preferences and those of their friends. But in his tests, he discovered that peo ple never used all of the features. “Why not?” he asks his mentor.

(With apologies to Socrates.)

DESIGNER

I’m frustrated; people aren’t using our application properly.

The screen shows the restaurant that we recommend. It matches their preferences, and their friends like it as well. If they want to see other recommendations, all they have to do is swipe left or right. To learn more about a place, just swipe up for a menu or down to see if any friends are there now. People seem to find the other recommendations, but not the menus or their friends? I don’t understand.

I don’t know. Should I add some affordances? Suppose I put an arrow on each edge and add a label saying what they do.

Yes, you have a point. But the affor dances weren’t visible. I made them visible.

Yes, isn’t that what I said?

Oh, I see. But then why do designers care about affordances? Perhaps we should focus our attention on signifiers.

MAPPING

Mapping is a technical term, borrowed from mathematics, mean ing the relationship between the elements of two sets of things. Suppose there are many lights in the ceiling of a classroom or au ditorium and a row of light switches on the wall at the front of the

FIGURE . 1.6

Signifiers on a Touch Screen.

The arrows and icons are signifiers: they pro vide signals about the permissible operations for this restaurant guide. Swiping left or right brings up new restaurant recommendations. Swiping up reveals the menu for the restau rant being displayed; swiping down, friends who recommend the restaurant.

room. The mapping of switches to lights specifies which switch controls which light.

Mapping is an important concept in the design and layout of controls and displays. When the mapping uses spatial correspon dence between the layout of the controls and the devices being controlled, it is easy to determine how to use them. In steering a car, we rotate the steering wheel clockwise to cause the car to turn right: the top of the wheel moves in the same direction as the car. Note that other choices could have been made. In early cars, steer ing was controlled by a variety of devices, including tillers, han dlebars, and reins. Today, some vehicles use joysticks, much as in a computer game. In cars that used tillers, steering was done much as one steers a boat: move the tiller to the left to turn to the right. Tractors, construction equipment such as bulldozers and cranes, and military tanks that have tracks instead of wheels use separate controls for the speed and direction of each track: to turn right, the left track is increased in speed, while the right track is slowed or even reversed. This is also how a wheelchair is steered. All of these mappings for the control of vehicles work because each has a compelling conceptual model of how the operation of the control affects the vehicle. Thus, if we speed up the left wheel of a wheelchair while stopping the right wheel, it is easy to imag ine the chair’s pivoting on the right wheel, circling to the right. In a small boat, we can understand the tiller by realizing that pushing the tiller to the left causes the ship’s rudder to move to the right and the resulting force of the water on the rudder slows down the right side of the boat, so that the boat rotates to the right. It doesn’t matter whether these conceptual models are accurate: what mat ters is that they provide a clear way of remembering and under standing the mappings. The relationship between a control and its results is easiest to learn wherever there is an understandable mapping between the controls, the actions, and the intended result. Natural mapping, by which I mean taking advantage of spatial analogies, leads to immediate understanding. For example, to move an object up, move the control up. To make it easy to determine which control works which light in a large room or auditorium, arrange the controls in the same pattern as the lights. Some natural mappings are cultural or biological, as in the universal standard that moving the hand up signifies more, moving it down signifies less, which is why it is appropriate to use vertical position to rep resent intensity or amount. Other natural mappings follow from the principles of perception and allow for the natural grouping or patterning of controls and feedback. Groupings and proximity are important principles from Gestalt psychology that can be used to map controls to function: related controls should be grouped to gether. Controls should be close to the item being controlled. Note that there are many mappings that feel “natural” but in fact are specific to a particular culture: what is natural for one culture is not necessarily natural for another. In Chapter 3, I discuss how

Good Mapping: Automobile Seat Adjustment Control. This is an excellent example of natural mapping. The control is in the shape of the seat itself: the mapping is straightforward. To move the front edge of the seat higher, lift up on the front part of the button. To make the seat back recline, move the button back. The same principle could be applied to much more common objects. This partic ular control is from Mercedes-Benz, but this form of mapping is now used by many automobile compa

FIGURE . 1.7

nies. (Photograph by the author.)

different cultures view time, which has important implications for some kinds of mappings.

A device is easy to use when the set of possible actions is visi ble, when the controls and displays exploit natural mappings. The principles are simple but rarely incorporated into design. Good de sign takes care, planning, thought, and an understanding of how people behave.

FEEDBACK

Ever watch people at an elevator repeatedly push the Up button, or repeatedly push the pedestrian button at a street crossing? Ever drive to a traffic intersection and wait an inordinate amount of time for the signals to change, wondering all the time whether the detection circuits noticed your vehicle (a common problem with bicycles)? What is missing in all these cases is feedback: some way of letting you know that the system is working on your request.

Feedback—communicating the results of an action—is a well known concept from the science of control and information theory. Imagine trying to hit a target with a ball when you cannot see the target. Even as simple a task as picking up a glass with the hand re quires feedback to aim the hand properly, to grasp the glass, and to lift it. A misplaced hand will spill the contents, too hard a grip will break the glass, and too weak a grip will allow it to fall. The human nervous system is equipped with numerous feedback mechanisms, including visual, auditory, and touch sensors, as well as vestibular and proprioceptive systems that monitor body position and mus cle and limb movements. Given the importance of feedback, it is amazing how many products ignore it.

Feedback must be immediate: even a delay of a tenth of a second can be disconcerting. If the delay is too long, people often give up, going off to do other activities. This is annoying to the people, but it can also be wasteful of resources when the system spends con siderable time and effort to satisfy the request, only to find that the intended recipient is no longer there. Feedback must also be infor mative. Many companies try to save money by using inexpensive lights or sound generators for feedback. These simple light flashes or beeps are usually more annoying than useful. They tell us that something has happened, but convey very little information about what has happened, and then nothing about what we should do about it. When the signal is auditory, in many cases we cannot even be certain which device has created the sound. If the signal is a light, we may miss it unless our eyes are on the correct spot at the correct time. Poor feedback can be worse than no feedback at all, because it is distracting, uninformative, and in many cases irritating and anxiety-provoking.

Too much feedback can be even more annoying than too little. My dishwasher likes to beep at three a.m. to tell me that the wash is done, defeating my goal of having it work in the middle of the night so as not to disturb anyone (and to use less expensive elec tricity). But worst of all is inappropriate, uninterpretable feedback. The irritation caused by a “backseat driver” is well enough known that it is the staple of numerous jokes. Backseat drivers are often correct, but their remarks and comments can be so numerous and continuous that instead of helping, they become an irritating dis traction. Machines that give too much feedback are like backseat drivers. Not only is it distracting to be subjected to continual flash ing lights, text announcements, spoken voices, or beeps and boops, but it can be dangerous. Too many announcements cause people to ignore all of them, or wherever possible, disable all of them, which means that critical and important ones are apt to be missed. Feed back is essential, but not when it gets in the way of other things, including a calm and relaxing environment.

Poor design of feedback can be the result of decisions aimed at reducing costs, even if they make life more difficult for people. Rather than use multiple signal lights, informative displays, or rich, musical sounds with varying patterns, the focus upon cost reduction forces the design to use a single light or sound to convey multiple types of information. If the choice is to use a light, then one flash might mean one thing; two rapid flashes, something else. A long flash might signal yet another state; and a long flash fol lowed by a brief one, yet another. If the choice is to use a sound, quite often the least expensive sound device is selected, one that can only produce a high-frequency beep. Just as with the lights, the only way to signal different states of the machine is by beeping different patterns. What do all these different patterns mean? How can we possibly learn and remember them? It doesn’t help that every different machine uses a different pattern of lights or beeps, sometimes with the same patterns meaning contradictory things for different machines. All the beeps sound alike, so it often isn’t even possible to know which machine is talking to us. Feedback has to be planned. All actions need to be confirmed, but in a manner that is unobtrusive. Feedback must also be prior itized, so that unimportant information is presented in an unob trusive fashion, but important signals are presented in a way that does capture attention. When there are major emergencies, then even important signals have to be prioritized. When every device is signaling a major emergency, nothing is gained by the result ing cacophony. The continual beeps and alarms of equipment can be dangerous. In many emergencies, workers have to spend valu able time turning off all the alarms because the sounds interfere with the concentration required to solve the problem. Hospital op erating rooms, emergency wards. Nuclear power control plants. Airplane cockpits. All can become confusing, irritating, and life endangering places because of excessive feedback, excessive alarms, and incompatible message coding. Feedback is essential, but it has to be done correctly. Appropriately.

CONCEPTUAL MODELS

A conceptual model is an explanation, usually highly simplified, of how something works. It doesn’t have to be complete or even accurate as long as it is useful. The files, folders, and icons you see displayed on a computer screen help people create the conceptual model of documents and folders inside the computer, or of apps or applications residing on the screen, waiting to be summoned. In fact, there are no folders inside the computer—those are effective conceptualizations designed to make them easier to use. Some times these depictions can add to the confusion, however. When reading e-mail or visiting a website, the material appears to be on the device, for that is where it is displayed and manipulated. But in fact, in many cases the actual material is “in the cloud,” located on some distant machine. The conceptual model is of one, coherent image, whereas it may actually consist of parts, each located on different machines that could be almost anywhere in the world. This simplified model is helpful for normal usage, but if the net work connection to the cloud services is interrupted, the result can be confusing. Information is still on their screen, but users can no longer save it or retrieve new things: their conceptual model offers no explanation. Simplified models are valuable only as long as the assumptions that support them hold true.

There are often multiple conceptual models of a product or de vice. People’s conceptual models for the way that regenerative braking in a hybrid or electrically powered automobile works are quite different for average drivers than for technically sophisti cated drivers, different again for whoever must service the system, and yet different again for those who designed the system.

Conceptual models found in technical manuals and books for technical use can be detailed and complex. The ones we are con cerned with here are simpler: they reside in the minds of the peo ple who are using the product, so they are also “mental models.” Mental models, as the name implies, are the conceptual models in people’s minds that represent their understanding of how things work. Different people may hold different mental models of the same item. Indeed, a single person might have multiple models of the same item, each dealing with a different aspect of its opera tion: the models can even be in conflict.

Conceptual models are often inferred from the device itself. Some models are passed on from person to person. Some come from manuals. Usually the device itself offers very little assistance, so the model is constructed by experience. Quite often these models are erroneous, and therefore lead to difficulties in using the device. The major clues to how things work come from their perceived structure—in particular from signifiers, affordances, constraints, and mappings. Hand tools for the shop, gardening, and the house tend to make their critical parts sufficiently visible that concep

Junghans Mega 1000 Digital Radio Controlled Watch. There is no good conceptual model for understanding the operation of my watch. It has five buttons with no hints as to what each one does. And yes, the buttons do different things in their different modes. But it is a very nice-looking watch, and always has the exact time because it checks official radio time stations.

FIGURE . 1.8

(The top row of the display is the date: Wednesday, Feb ruary 20, the eighth week of the year.) (Photograph by the

author.)

tual models of their operation and function are readily derived. Consider a pair of scissors: you can see that the number of possi ble actions is limited. The holes are clearly there to put something into, and the only logical things that will fit are fingers. The holes are both affordances—they allow the fingers to be inserted—and signifiers—they indicate where the fingers are to go. The sizes of the holes provide constraints to limit the possible fingers: a big hole suggests several fingers; a small hole, only one. The mapping between holes and fingers—the set of possible operations—is sig nified and constrained by the holes. Moreover, the operation is not sensitive to finger placement: if you use the wrong fingers (or the wrong hand), the scissors still work, although not as comfortably. You can figure out the scissors because their operating parts are visible and the implications clear. The conceptual model is obvious, and there is effective use of signifiers, affordances, and constraints.

What happens when the device does not suggest a good concep tual model? Consider my digital watch with five buttons: two along the top, two along the bottom, and one on the left side (Figure 1.8). What is each button for? How would you set the time? There is no way to tell—no evident relationship between the operating controls and the functions, no constraints, no apparent mappings. Moreover, the buttons have multiple ways of being used. Two of the buttons do different things when pushed quickly or when kept depressed for several seconds. Some operations require simultaneous depres sion of several of the buttons. The only way to tell how to work the watch is to read the manual, over and over again. With the scissors, moving the handle makes the blades move. The watch provides no visible relationship between the buttons and the possible actions, no discernible relationship between the actions and the end results. I really like the watch: too bad I can’t remember all the functions.

Conceptual models are valuable in providing understanding, in predicting how things will behave, and in figuring out what to do when things do not go as planned. A good conceptual model allows us to predict the effects of our actions. Without a good model, we op erate by rote, blindly; we do operations as we were told to do them; we can’t fully appreciate why, what effects to expect, or what to do if things go wrong. As long as things work properly, we can manage. When things go wrong, however, or when we come upon a novel situation, then we need a deeper understanding, a good model. For everyday things, conceptual models need not be very com plex. After all, scissors, pens, and light switches are pretty simple devices. There is no need to understand the underlying physics or chemistry of each device we own, just the relationship between the controls and the outcomes. When the model presented to us is inadequate or wrong (or, worse, nonexistent), we can have difficul ties. Let me tell you about my refrigerator.

I used to own an ordinary, two-compartment refrigerator—nothing very fancy about it. The problem was that I couldn’t set the tem perature properly. There were only two things to do: adjust the temperature of the freezer compartment and adjust the tempera

Refrigerator Controls. Two compartments— fresh food and freezer—and two controls (in the fresh food unit). Your task: Suppose the freezer is too cold, the fresh food section just right. How would you adjust the controls so as to make the freezer warmer and keep the fresh food the same?

FIGURE . 1.9

(Photograph by the author.)

ture of the fresh food compartment. And there were two controls, one labeled “freezer,” the other “refrigerator.” What’s the problem? Oh, perhaps I’d better warn you. The two controls are not inde pendent. The freezer control also affects the fresh food tempera ture, and the fresh food control also affects the freezer. Moreover, the manual warns that one should “always allow twenty-four (24) hours for the temperature to stabilize whether setting the controls for the first time or making an adjustment.”

It was extremely difficult to regulate the temperature of my old refrigerator. Why? Because the controls suggest a false conceptual model. Two compartments, two controls, which implies that each control is responsible for the temperature of the compartment that carries its name: this conceptual model is shown in Figure 1.10A. It is wrong. In fact, there is only one thermostat and only one cooling mechanism. One control adjusts the thermostat setting, the other the relative proportion of cold air sent to each of the two compart ments of the refrigerator. This is why the two controls interact: this conceptual model is shown in Figure 1.10B. In addition, there must be a temperature sensor, but there is no way of knowing where it is located. With the conceptual model suggested by the controls,

A.

B.

Two Conceptual Models for a Refrigerator. The conceptual model A is provided by the system image of the refrigerator as gleaned from the controls. Each control determines the temperature of the named part of the refrigerator. This means that each compartment has its own temperature sensor and cooling unit. This is wrong. The correct conceptual model is shown in B. There is no way of knowing where the temperature sensor is located so it is shown outside the refrigerator. The freezer control determines the freezer temperature (so is this where the sensor is located?). The refrigerator control determines how much of the cold air goes to the freezer and how much to the refrigerator.

FIGURE . 1.10

adjusting the temperatures is almost impossible and always frus trating. Given the correct model, life would be much easier. Why did the manufacturer suggest the wrong conceptual model? We will never know. In the twenty-five years since the publication of the first edition of this book, I have had many letters from people thanking me for explaining their confusing refrigerator, but never any communication from the manufacturer (General Electric). Per haps the designers thought the correct model was too complex, that the model they were giving was easier to understand. But with the wrong conceptual model, it was impossible to set the controls. And even though I am convinced I knew the correct model, I still couldn’t accurately adjust the temperatures because the refrigera tor design made it impossible to discover which control was for the temperature sensor, which for the relative proportion of cold air, and in which compartment the sensor was located. The lack of im mediate feedback for the actions did not help: it took twenty-four hours to see whether the new setting was appropriate. I shouldn’t have to keep a laboratory notebook and do controlled experiments just to set the temperature of my refrigerator.

I am happy to say that I no longer own that refrigerator. In stead I have one that has two separate controls, one in the fresh food compartment, one in the freezer compartment. Each control is nicely calibrated in degrees and labeled with the name of the compartment it controls. The two compartments are independent: setting the temperature in one has no effect on the temperature in the other. This solution, although ideal, does cost more. But far less expensive solutions are possible. With today’s inexpensive sensors and motors, it should be possible to have a single cooling unit with a motor-controlled valve controlling the relative proportion of cold air diverted to each compartment. A simple, inexpensive computer chip could regulate the cooling unit and valve position so that the temperatures in the two compartments match their targets. A bit more work for the engineering design team? Yes, but the results would be worth it. Alas, General Electric is still selling refrigerators with the very same controls and mechanisms that cause so much confusion. The photograph in Figure 1.9 is from a contemporary refrigerator, photographed in a store while preparing this book.

The System Image

People create mental models of themselves, others, the environ ment, and the things with which they interact. These are concep tual models formed through experience, training, and instruction. These models serve as guides to help achieve our goals and in un derstanding the world.

How do we form an appropriate conceptual model for the de vices we interact with? We cannot talk to the designer, so we rely upon whatever information is available to us: what the device looks like, what we know from using similar things in the past, what was told to us in the sales literature, by salespeople and ad vertisements, by articles we may have read, by the product website and instruction manuals. I call the combined information available to us the system image. When the system image is incoherent or in appropriate, as in the case of the refrigerator, then the user cannot easily use the device. If it is incomplete or contradictory, there will be trouble.

As illustrated in Figure 1.11, the designer of the product and the person using the product form somewhat disconnected vertices of a triangle. The designer’s conceptual model is the designer’s con ception of the product, occupying one vertex of the triangle. The product itself is no longer with the designer, so it is isolated as a second vertex, perhaps sitting on the user’s kitchen counter. The system image is what can be perceived from the physical struc ture that has been built (including documentation, instructions, signifiers, and any information available from websites and help lines). The user’s conceptual model comes from the system image, through interaction with the product, reading, searching for online information, and from whatever manuals are provided. The de signer expects the user’s model to be identical to the design model, but because designers cannot communicate directly with users, the entire burden of communication is on the system image.

The Designer’s Model, the User’s Model, and the System Im

FIGURE . 1.11

age. The designer’s conceptual model is the designer’s conception of the look, feel, and operation of a product. The system image is what can be derived from the physical structure that has been built (including documentation). The user’s mental model is developed through in teraction with the product and the system image. Designers expect the user’s model to be identical to their own, but because they cannot communicate directly with the user, the burden of communication is with the system image.

Figure 1.11 indicates why communication is such an important aspect of good design. No matter how brilliant the product, if peo ple cannot use it, it will receive poor reviews. It is up to the de signer to provide the appropriate information to make the product understandable and usable. Most important is the provision of a good conceptual model that guides the user when thing go wrong. With a good conceptual model, people can figure out what has happened and correct the things that went wrong. Without a good model, they struggle, often making matters worse. Good conceptual models are the key to understandable, enjoy able products: good communication is the key to good conceptual models.

The Paradox of Technology

Technology offers the potential to make life easier and more en joyable; each new technology provides increased benefits. At the same time, added complexities increase our difficulty and frustra tion with technology. The design problem posed by technological advances is enormous. Consider the wristwatch. A few decades ago, watches were simple. All you had to do was set the time and keep the watch wound. The standard control was the stem: a knob at the side of the watch. Turning the knob would wind the spring that provided power to the watch movement. Pulling out the knob and turning it rotated the hands. The operations were easy to learn and easy to do. There was a reasonable relationship between the turning of the knob and the resulting turning of the hands. The design even took into account human error. In its normal position, turning the stem wound the mainspring of the clock. The stem had to be pulled before it would engage the gears for setting the time. Accidental turns of the stem did no harm.

Watches in olden times were expensive instruments, manu factured by hand. They were sold in jewelry stores. Over time, with the introduction of digital technology, the cost of watches decreased rapidly, while their accuracy and reliability increased. Watches became tools, available in a wide variety of styles and shapes and with an ever-increasing number of functions. Watches were sold everywhere, from local shops to sporting goods stores to electronic stores. Moreover, accurate clocks were incorporated in many appliances, from phones to musical keyboards: many people no longer felt the need to wear a watch. Watches became inexpen sive enough that the average person could own multiple watches. They became fashion accessories, where one changed the watch with each change in activity and each change of clothes. In the modern digital watch, instead of winding the spring, we change the battery, or in the case of a solar-powered watch, ensure that it gets its weekly dose of light. The technology has allowed more functions: the watch can give the day of the week, the month, and the year; it can act as a stopwatch (which itself has several functions), a countdown timer, and an alarm clock (or two); it has the ability to show the time for different time zones; it can act as a counter and even as a calculator. My watch, shown in Figure 1.8, has many functions. It even has a radio receiver to allow it to set its time with official time stations around the world. Even so, it is far less complex than many that are available. Some watches have built-in compasses and barometers, accelerometers, and tem perature gauges. Some have GPS and Internet receivers so they can display the weather and news, e-mail messages, and the lat est from social networks. Some have built-in cameras. Some work with buttons, knobs, motion, or speech. Some detect gestures. The watch is no longer just an instrument for telling time: it has become a platform for enhancing multiple activities and lifestyles.

The added functions cause problems: How can all these func tions fit into a small, wearable size? There are no easy answers. Many people have solved the problem by not using a watch. They use their phone instead. A cell phone performs all the functions much better than the tiny watch, while also displaying the time. Now imagine a future where instead of the phone replacing the watch, the two will merge, perhaps worn on the wrist, per haps on the head like glasses, complete with display screen. The phone, watch, and components of a computer will all form one unit. We will have flexible displays that show only a tiny amount of information in their normal state, but that can unroll to consid erable size. Projectors will be so small and light that they can be built into watches or phones (or perhaps rings and other jewelry), projecting their images onto any convenient surface. Or perhaps our devices won’t have displays, but will quietly whisper the re sults into our ears, or simply use whatever display happens to be available: the display in the seatback of cars or airplanes, hotel room televisions, whatever is nearby. The devices will be able to do many useful things, but I fear they will also frustrate: so many things to control, so little space for controls or signifiers. The ob vious solution is to use exotic gestures or spoken commands, but how will we learn, and then remember, them? As I discuss later, the best solution is for there to be agreed upon standards, so we need learn the controls only once. But as I also discuss, agreeing upon these is a complex process, with many competing forces hin dering rapid resolution. We will see.

The same technology that simplifies life by providing more functions in each device also complicates life by making the device harder to learn, harder to use. This is the paradox of technology and the challenge for the designer.

The Design Challenge

Design requires the cooperative efforts of multiple disciplines. The number of different disciplines required to produce a successful product is staggering. Great design requires great designers, but that isn’t enough: it also requires great management, because the hardest part of producing a product is coordinating all the many, separate disciplines, each with different goals and priorities. Each discipline has a different perspective of the relative importance of the many factors that make up a product. One discipline argues that it must be usable and understandable, another that it must be attractive, yet another that it has to be affordable. Moreover, the de vice has to be reliable, be able to be manufactured and serviced. It must be distinguishable from competing products and superior in critical dimensions such as price, reliability, appearance, and the functions it provides. Finally, people have to actually purchase it. It doesn’t matter how good a product is if, in the end, nobody uses it.

Quite often each discipline believes its distinct contribution to be most important: “Price,” argues the marketing representative, “price plus these features.” “Reliable,” insist the engineers. “We have to be able to manufacture it in our existing plants,” say the manufacturing representatives. “We keep getting service calls,” say the support people; “we need to solve those problems in the design.” “You can’t put all that together and still have a reasonable product,” says the design team. Who is right? Everyone is right. The successful product has to satisfy all these requirements.

The hard part is to convince people to understand the view points of the others, to abandon their disciplinary viewpoint and to think of the design from the viewpoints of the person who buys the product and those who use it, often different people. The view point of the business is also important, because it does not matter how wonderful the product is if not enough people buy it. If a product does not sell, the company must often stop producing it, even if it is a great product. Few companies can sustain the huge cost of keeping an unprofitable product alive long enough for its sales to reach profitability—with new products, this period is usu ally measured in years, and sometimes, as with the adoption of high-definition television, decades.

Designing well is not easy. The manufacturer wants something that can be produced economically. The store wants something that will be attractive to its customers. The purchaser has several demands. In the store, the purchaser focuses on price and appear ance, and perhaps on prestige value. At home, the same person will pay more attention to functionality and usability. The repair service cares about maintainability: how easy is the device to take apart, diagnose, and service? The needs of those concerned are different and often conflict. Nonetheless, if the design team has representatives from all the constituencies present at the same time, it is often possible to reach satisfactory solutions for all the needs. It is when the disciplines operate independently of one another that major clashes and deficiencies occur. The challenge is to use the principles of human-centered design to produce pos itive results, products that enhance lives and add to our pleasure and enjoyment. The goal is to produce a great product, one that is successful, and that customers love. It can be done.

Human-Computer

Interaction

An Empirical Research Perspective

I. Scott MacKenzie

Historical Context

1

Human-computer interaction. In the beginning, there were humans. In the 1940s came computers. Then in the 1980s came interaction. Wait! What happened between 1940 and 1980? Were humans not interacting with computers then? Well, yes, but not just any human. Computers in those days were too precious, too com plicated, to allow the average human to mess with them. Computers were carefully guarded. They lived a secluded life in large air-conditioned rooms with raised "oors and locked doors in corporate or university research labs or government facilities. The rooms often had glass walls to show off the unique status of the behemoths within.

If you were of that breed of human who was permitted access, you were prob ably an engineer or a scientist—speci#cally, a computer scientist. And you knew what to do. Whether it was connecting relays with patch cords on an ENIAC (1940s), changing a magnetic memory drum on a UNIVAC (1950s), adjusting the JCL stack on a System/360 (1960s), or greping and awking around the unix com mand set on a PDP-11 (1970s), you were on home turf. Unix commands like grep, for global regular expression print, were obvious enough. Why consult the manual? You probably wrote it! As for unix’s vi editor, if some poor soul was stupid enough to start typing text while in command mode, well, he got what he deserved. 1 Who gave him a login account, anyway? And what’s all this talk about make the state of the system visible to the user? What user? Sounds a bit like … well … socialism!

Interaction was not on the minds of the engineers and scientists who designed, built, con#gured, and programmed the early computers. But by the 1980s inter action was an issue. The new computers were not only powerful, they were use able—by anyone! With usability added, computers moved from their earlier secure con#nes onto people’s desks in workplaces and, more important, into people’s homes. One reason human–computer interaction (HCI) is so exciting is that the #eld’s emergence and progress are aligned with, and in good measure responsible for, this dramatic shift in computing practices.

1 One of the classic UI foibles—told and re-told by HCI educators around the world—is the vi editor’s lack of feedback when switching between modes. Many a user made the mistake of providing input while in command mode or entering a command while in input mode.

Human-Computer Interaction.

© 2013 Elsevier Inc. All rights reserved.

1

This book is about research in human-computer interaction. As in all #elds, research in HCI is the force underlying advances that migrate into products and processes that people use, whether for work or pleasure. While HCI itself is broad and includes a substantial applied component—most notably in design—the focus in this book is narrow. The focus is on research—the what, the why, and the how— with a few stories to tell along the way.

Many people associate research in HCI with developing a new or improved interaction or interface and testing it in a user study. The term “user study” some times refers to an informal evaluation of a user interface. But this book takes a more formal approach, where a user study is “an experiment with human participants.” HCI experiment are discussed throughout the book. The word empirical is added to this book’s title to give weight to the value of experimental research. The research espoused here is empirical because it is based on observation and experience and is carried out and reported on in a manner that allows results to be veri#ed or refuted through the efforts of other researchers. In this way, each item of HCI research joins a large body of work that, taken as a whole, de#nes the #eld and sets the con text for applying HCI knowledge in real products or processes.

1.1 Introduction

Although HCI emerged in the 1980s, it owes a lot to older disciplines. The most central of these is the #eld of human factors, or ergonomics. Indeed, the name of the preeminent annual conference in HCI—the Association for Computing Machinery Conference on Human Factors in Computing Systems (ACM SIGCHI)—uses that term. SIGCHI is the special interest group on computer-human interaction sponsored by the ACM. 2

Human factors is both a science and a #eld of engineering. It is concerned with human capabilities, limitations, and performance, and with the design of systems that are ef#cient, safe, comfortable, and even enjoyable for the humans who use them. It is also an art in the sense of respecting and promoting creative ways for practitioners to apply their skills in designing systems. One need only change sys tems in that statement to computer systems to make the leap from human factors to HCI. HCI, then, is human factors, but narrowly focused on human interaction with computing technology of some sort.

That said, HCI itself does not feel “narrowly focused.” On the contrary, HCI is tre mendously broad in scope. It draws upon interests and expertise in disciplines such as psychology (particularly cognitive psychology and experimental psychology), sociol ogy, anthropology, cognitive science, computer science, and linguistics.

2 The Association of Computing Machinery (ACM), founded in 1947, is the world’s leading educa tional and scienti#c computing society, with over 95,000 members. The ACM is organized into over 150 special interest groups, or “SIGs.” Among the services offered is the ACM Digital Library, a repository of online publications which includes 45+ ACM journals, 85+ ACM conference proceed ings, and numerous other publications from af#liated organizations. See www.acm.org.

1.2 Vannevar Bush’s “as we may think” (1945)

FIGURE 1.1

Timeline of notable events in the history of human–computer interaction HCI.

Figure 1.1 presents a timeline of a few notable events leading to the birth and emergence of HCI as a #eld of study, beginning in the 1940s.

1.2 Vannevar Bush’s “as we may think” (1945)

Vannevar Bush’s prophetic essay “As We May Think,” published in the Atlantic Monthly in July, 1945 (Bush, 1945), is required reading in many HCI courses even today. The article has garnered 4,000+ citations in scholarly publications. 3 Attesting to the importance of Bush’s vision to HCI is the 1996 reprint of the entire essay in the ACM’s interactions magazine, complete with annotations, sketches, and biographical notes.

Bush (see Figure 1.2) was the U.S. government’s Director of the Of#ce of Scienti#c Research and a scienti#c advisor to President Franklin D. Roosevelt. During World War II, he was charged with leading some 6,000 American scientists in the application of science to warfare. But Bush was keenly aware of the possi bilities that lay ahead in peacetime in applying science to more lofty and humane

3 Google Scholar search using author: “v bush.”

FIGURE 1.2

Vannevar Bush at work (circa 1940–1944).

pursuits. His essay concerned the dissemination, storage, and access to scholarly knowledge. Bush wrote:

the summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships (p. 37).4

Aside from the reference to antiquated square-rigged ships, what Bush says we can fully relate to today, especially his mention of the expanding human experience in relation to HCI. For most people, nothing short of Olympian talent is needed to keep abreast of the latest advances in the information age. Bush’s consequent maze is today’s information overload or lost in hyperspace. Bush’s momentarily impor tant item sounds a bit like a blog posting or a tweet. Although blogs and tweets didn’t exist in 1945, Bush clearly anticipated them.

Bush proposed navigating the knowledge maze with a device he called memex. Among the features of memex is associative indexing, whereby points of interest can be connected and joined so that selecting one item immediately and automati cally selects another: “When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard” (Bush, 1945, p. 44). This sounds like a description of hyperlinks and bookmarks. Although today it is easy to equate memex with hypertext and the World Wide Web, Bush’s inspiration for this idea came from the contemporary telephone exchange, which he described as a “spider web of metal, sealed in a thin glass container” (viz. vacuum tubes) (p. 38). The maze of connections in a telephone exchange gave rise to Bush’s more general theme of a spider web of connections for the information in one’s mind, linking one’s experiences.

It is not surprising that some of Bush’s ideas, for instance, dry photogra phy, today seem naïve. Yet the ideas are naïve only when juxtaposed with Bush’s

4 For convenience, page references are to the March 1996 reprint in the ACM’s interactions.

1.3 Ivan Sutherland’s Sketchpad (1962)

(a) Demo of Ivan Sutherland’s Sketchpad. (b) A light pen dragging (“rubber banding”) lines, subject to constraints.

brilliant foretelling of a world we are still struggling with and are still #ne-tuning and perfecting.

1.3 Ivan Sutherland’s Sketchpad (1962)

Ivan Sutherland developed Sketchpad in the early 1960s as part of his PhD research in electrical engineering at the Massachusetts Institute of Technology (M.I.T.). Sketchpad was a graphics system that supported the manipulation of geometric shapes and lines (objects) on a display using a light pen. To appreciate the inferior usability in the computers available to Sutherland at the time of his studies, con sider these introductory comments in a paper he published in 1963:

Heretofore, most interaction between man and computers has been slowed by the need to reduce all communication to written statements that can be typed. In the past we have been writing letters to, rather than conferring with, our com puters (Sutherland, 1963, p. 329).

With Sketchpad, commands were not typed. Users did not “write letters to” the computer. Instead, objects were drawn, resized, grabbed and moved, extended, deleted—directly, using the light pen (see Figure 1.3). Object manipulations worked with constraints to maintain the geometric relationships and properties of objects.

The use of a pointing device for input makes Sketchpad the #rst direct manip ulation interface—a sign of things to come. The term “direct manipulation” was coined many years later by Ben Shneiderman at the University of Maryland to provide a psychological context for a suite of related features that naturally came together in this new genre of human–computer interface (Shneiderman, 1983). These features included visibility of objects, incremental action, rapid feedback, reversibility, exploration, syntactic correctness of all actions, and replacing lan guage with action. While Sutherland’s Sketchpad was one of the earliest examples of a direct manipulation system, others soon followed, most notably the Dynabook concept system by Alan Kay of the Xerox Palo Alto Research Center (PARC) (Kay and Goldberg, 1977). I will say more about Xerox PARC throughout this chapter.

Sutherland’s work was presented at the Institute of Electrical and Electronics Engineers (IEEE) conference in Detroit in 1963 and subsequently published in its proceedings (Sutherland, 1963). The article is available in the ACM Digital Library (http://portal.acm.org). Demo videos of Sketchpad are available on YouTube (www. youtube.com). Not surprisingly, a user study of Sketchpad was not conducted, since Sutherland was a student of electrical engineering. Had his work taken place in the #eld of industrial engineering (where human factors is studied), user testing would have been more likely.

1.4 Invention of the mouse (1963)

If there is one device that symbolizes the emergence of HCI, it is the computer mouse. Invented by Douglas Engelbart in 1963, the mouse was destined to funda mentally change the way humans interact with computers. 5 Instead of typing com mands, a user could manipulate a mouse to control an on-screen tracking symbol, or cursor. With the cursor positioned over a graphic image representing the com mand, the command is issued with a select operation—pressing and releasing a but ton on the mouse.

Engelbart was among a group of researchers at the Stanford Research Institute (SRI) in Menlo Park, California. An early hypertext system called NLS, for oN Line System, was the project for which an improved pointing device was needed. Speci#cally, the light pen needed to be replaced. The light pen was an established technology, but it was awkward. The user held the pen in the air in front of the dis play. After a few minutes of interaction, fatigue would set in. A more natural and comfortable device might be something on the desktop, something in close proxim ity to the keyboard. The keyboard is where the user’s hands are normally situated, so a device beside the keyboard made the most sense. Engelbart’s invention met this requirement.

The #rst prototype mouse is seen in Figure 1.4a. The device included two poten tiometers positioned at right angles to each other. Large metal wheels were attached to the shafts of the potentiometers and protruded slightly from the base of the hous ing. The wheels rotated as the device was moved across a surface. Side-to-side motion rotated one wheel; to-and-fro motion rotated the other. With diagonal move ment, both wheels rotated, in accordance with the amount of movement in each direction. The amount of rotation of each wheel altered the voltage at the wiper terminal of the potentiometer. The voltages were passed on to the host system for processing. The x and y positions of an on-screen object or cursor were indirectly 5 Engelbart’s patent for the mouse was #led on June 21, 1967 and issued on November 17, 1970 (Engelbart, 1970). U.S. patent laws allow one year between public disclosure and #ling; thus, it can be assumed that prior to June 21, 1966, Engelbart’s invention was not disclosed to the public.

(a) The first mouse. (b) Inventor Douglas Engelbart holding his invention in his left hand and an early three-button variation in his right hand.

controlled by the two voltage signals. In Figure 1.4a, a selection button can be seen under the user’s index #nger. In Figure 1.4b, Engelbart is shown with his invention in his left hand and a three-button version of a mouse, which was developed much later, in his right.

Initial testing of the mouse focused on selecting and manipulating text, rather than drawing and manipulating graphic objects. Engelbart was second author of the #rst published evaluation of the mouse. This was, arguably, HCI’s #rst user study, so a few words are in order here. Engelbart, along with English and Berman con ducted a controlled experiment comparing several input devices capable of both selection and x-y position control of an on-screen cursor (English, Engelbart, and Berman, 1967). Besides the mouse, the comparison included a light pen, a joystick, a knee-controlled lever, and a Grafacon. The joystick (Figure 1.5a) had a moving stick and was operated in two control modes. In absolute or position-control mode, the cursor’s position on the display had an absolute correspondence to the posi tion of the stick. In rate-control mode, the cursor’s velocity was determined by the amount of stick de"ection, while the direction of the cursor’s motion was deter mined by the direction of the stick. An embedded switch was included for selection and was activated by pressing down on the stick.

The light pen (Figure 1.5b) was operated much like the pen used by Sutherland (see Figure 1.3). The device was picked up and moved to the display surface with the pen pointing at the desired object. A projected circle of orange light indicated the tar get to the lens system. Selection involved pressing a switch on the barrel of the pen.

The knee-controlled lever (Figure 1.5c) was connected to two potentiometers. Side-to-side knee motion controlled side-to-side (x-axis) cursor movement; up and-down knee motion controlled up-and-down (y-axis) cursor movement. Up-and down knee motion was achieved by a “rocking motion on the ball of the foot” (p. 7). The device did not include an integrated method for selection. Instead, a key on the system’s keyboard was used.

FIGURE 1.5

Additional devices used in the first comparative evaluation of a mouse: (a) Joystick. (b) Lightpen. (c) Knee-controlled lever. (d) Grafacon.

(Source: a, b, d, adapted from English et al., 1967; c, 1967 IEEE. Reprinted with permission)

The Grafacon (Figure 1.5d) was a commercial device used for tracing curves. As noted, the device consisted “of an extensible arm connected to a linear potentiometer, with the housing for the linear potentiometer pivoted on an angular potentiometer” (1967, 6). Originally, there was a pen at the end of the arm; however, this was replaced with a knob-and-switch assembly (see Figure 1.5). The user gripped the knob and moved it about to control the on-screen cursor. Pressing the knob caused a selection.

The knee-controlled lever and Grafacon are interesting alternatives to the mouse. They illustrate and suggest the processes involved in empirical research. It is not likely that Engelbart simply woke up one morning and invented the mouse. While it may be true that novel ideas sometimes arise through “eureka” moments, typically there is more to the process of invention. Re#ning ideas—deciding what works and what doesn’t—is an iterative process that involves a good deal of trial and error. No doubt, Engelbart and colleagues knew from the outset that they needed a device that would involve some form of human action as input and would produce two channels (x-y) of analog positioning data as output. A select operation was also needed to produce a command or generate closure at the end of a posi tioning operation. Of course, we know this today as a point-select, or point-and click, operation. Operating the device away from the display meant some form of on-screen tracker (a cursor) was needed to establish correspondence between the device space and the display space. While this seems obvious today, it was a newly emerging form of human-to-computer interaction in the 1960s.

In the comparative evaluation, English et al. (1967) measured users’ access time (the time to move the hand from the keyboard to the device) and motion time (the time from the onset of cursor movement to the #nal selection). The evaluation included 13 participants (eight experienced in working with the devices and three inexperienced). For each trial, a character target (with surrounding distracter targets) appeared on the display. The trial began with the participant pressing and releasing the spacebar on the system’s keyboard, whereupon a cursor appeared on the display. The participant moved his or her hand to the input device and then manipulated the device to move the cursor to the target. With the cursor over the target, a selection was made using the method associated with the device. Examples of the test results from the inexperienced participants are shown in Figure 1.6. Each bar represents the mean for ten sequences. Every sequence consists of eight target-patterns. Results are shown for the mean task completion time (Figure 1.6a) and error rate (Figure 1.6b), where the error rate is the ratio of missed target selections to all selections.

While it might appear that the knee-controlled lever is the best device in terms of time, each bar in Figure 1.6a includes both the access time and the motion time. The access time for the knee-controlled lever is, of course, zero. The authors noted that considering motion time only, the knee-controlled lever “no longer shows up so favorably” (p. 12). At 2.43 seconds per trial, the light pen had a slight advan tage over the mouse at 2.62 seconds per trial; however, this must be viewed with consideration for the inevitable discomfort in continued use of a light pen, which is operated in the air at the surface of the display. Besides, the mouse was the clear winner in terms of accuracy. The mouse error rate was less than half that of any other device condition in the evaluation (see Figure 1.6b).

The mouse evaluation by English et al. (1967) marks an important milestone in empirical research in HCI. The methodology was empirical and the write-up included most of what is expected today in a user study destined for presentation at a conference and publication in a conference proceedings. For example, the write up contained a detailed description of the participants, the apparatus, and the proce dure. The study could be reproduced if other researchers wished to verify or refute the #ndings. Of course, reproducing the evaluation today would be dif#cult, as the devices are no longer available. The evaluation included an independent variable, input method, with six levels: mouse, light pen, joystick (position-control), joystick (rate-control), and knee-controlled lever. There were two dependent variables, task completion time and error rate. The order of administering the device conditions was different for each participant, a practice known today as counterbalancing. While testing for statistically signi#cant differences using an analysis of variance (ANOVA) was not done, it is important to remember that the authors did not have at their disposal the many tools taken for granted today, such as spreadsheets and statistics applications.

The next published comparative evaluation involving a mouse was by Card, English, and Burr (1978), about 10 years later. Card et al.’s work was carried out at Xerox PARC and was part of a larger effort that eventually produced the #rst windows-based graphical user interface, or GUI (see next section). The mouse

FIGURE 1.6

Results of first comparative evaluation of a computer mouse: (a) Task completion time in seconds. (b) Error rate as the ratio of missed selections to all selections.

(Adapted from English et al., 1967)

underwent considerable re#ning and reengineering at PARC. Most notably, the potentiometer wheels were replaced with a rolling ball assembly, developed by Rider (1974). The advantage of the re#ned mouse over competing devices was re con#rmed by Card et al. (1978) and has been demonstrated in countless compara tive evaluations since and throughout the history of HCI. It was becoming clear that Engelbart’s invention was changing the face of human-computer interaction.

Years later, Engelbart would receive the ACM Turing Award (1997) and the ACM SIGCHI Lifetime Achievement Award (1998; 1st recipient). It is interesting that Engelbart’s seminal invention dates to the early 1960s, yet commercialization of the mouse did not occur until 1981, when the Xerox Star was launched.

1.5 Xerox star (1981)

There was a buzz around the "oor of the National Computer Conference (NCC) in May 1981. In those days, the NCC was the yearly conference for computing. It was both a gathering of researchers (sponsored by the American Federation of Information Processing Societies, or AFIPS) and a trade show. The trade show was huge. 6 All the players were there. There were big players, like IBM, and little play ers, like Qupro Data Systems of Kitchener, Ontario, Canada. I was there, “work ing the booth” for Qupro. Our main product was a small desktop computer system based on a single-board computer known as the Pascal MicroEngine.

The buzz at the NCC wasn’t about Qupro. It wasn’t about IBM, either. The buzz was about Xerox. “Have you been to the Xerox booth?” I would hear. “You gotta check it out. It’s really cool.” And indeed it was. The Xerox booth had a sub stantial crowd gathered around it throughout the duration of the conference. There were scripted demonstrations every hour or so, and the crowd was clearly excited by what they were seeing. The demos were of the Star, or the Xerox 8100 Star Information System, as it was formally named. The excitement was well deserved, as the 1981 launch of the Xerox Star at the NCC marks a watershed moment in the history of computing. The Star was the #rst commercially released computer system with a GUI. It had windows, icons, menus, and a pointing device (WIMP). It supported direct manipulation and what-you-see-is-what-you-get (WYSIWYG) interaction. The Star had what was needed to bring computing to the people.

The story of the Star began around 1970, when Xerox established its research center, PARC, in Palo Alto, California. The following year, Xerox signed an agree ment with SRI licensing Xerox to use Engelbart’s invention, the mouse (Johnson et al., 1989, p. 22). Over the next 10 years, development proceeded along a number of fronts. The most relevant development for this discussion is that of the Alto, the Star’s predecessor, which began in 1973. The Alto also included a GUI and mouse. It was used widely at Xerox and at a few external test sites. However, the Alto was never released commercially—a missed opportunity on a grand scale, according to some (D. K. Smith and Alexander, 1988).

Figure 1.7 shows the Star workstation, which is unremarkable by today’s stand ards. The graphical nature of the information on the system’s display can be seen in the image. This was novel at the time. The display was bit-mapped, meaning images were formed by mapping bits in memory to pixels on the display. Most sys tems at the time used character-mapped displays, meaning the screen image was composed of sequences of characters, each limited to a #xed pattern (e.g., 7 × 10 pixels) retrieved from read-only memory. Character-mapped displays required con siderably less memory, but limited the richness of the display image. The mouse—a two-button variety—is featured by the system’s keyboard.

6 Attendance #gures for 1981 are unavailable, but the NCC was truly huge. In 1983, NCC attendance exceeded 100,000 (Abrahams, 1987).

FIGURE 1.7

Xerox Star workstation.

As the designers noted, the Star was intended as an of!ce automation system (Johnson et al., 1989). Business professionals would have Star workstations on their desks and would use them to create, modify, and manage documents, graphics tables, presentations, etc. The workstations were connected via high-speed Ethernet cables and shared centralized resources, such as printers and #le servers. A key tenet in the Star philosophy was that workers wanted to get their work done, not #ddle with computers. Obviously, the computers had to be easy to use, or invisible, so to speak.

One novel feature of the Star was use of the desktop metaphor. Metaphors are important in HCI. When a metaphor is present, the user has a jump-start on knowing what to do. The user exploits existing knowledge from another domain. The desktop metaphor brings concepts from the of#ce desktop to the system’s display. On the dis play the user #nds pictorial representations (icons) for things like documents, folders, trays, and accessories such as a calculator, printer, or notepad. A few examples of the Star’s icons are seen in Figure 1.8. By using existing knowledge of a desktop, the user has an immediate sense of what to do and how things work. The Star designers, and others since, pushed the limits of the metaphor to the point where it is now more like an of#ce metaphor than a desktop metaphor. There are windows, printers, and a trashcan on the display, but of course these artifacts are not found on an of#ce desktop. However, the metaphor seemed to work, as we hear even today that the GUI is an example of the desktop metaphor. I will say more about metaphors again in Chapter 3.

In making the system usable (invisible), the Star developers created interactions that deal with #les, not programs. So users “open a document,” rather than “invoke an editor.” This means that #les are associated with applications, but these details are hidden from the user. Opening a spreadsheet document launches the spread sheet application, while opening a text document opens a text editor.

FIGURE 1.8

Examples of icons appearing on the Xerox Star desktop.

(Adapted from Smith, Irby, Kimball, and Harslem, 1982)

With a GUI and point-select interaction, the Star interface was the archetype of direct manipulation. The enabling work on graphical interaction (e.g., Sutherland) and pointing devices (e.g., Engelbart) was complete. By comparison, previous command-line interfaces had a single channel of input. For every action, a com mand was needed to invoke it. The user had to learn and remember the syntax of the system’s commands and type them in to get things done. Direct manipulation systems, like the Star, have numerous input channels, and each channel has a direct correspondence to a task. Furthermore, interaction with the channel is tailored to the properties of the task. A continuous property, such as display brightness or sound volume, has a continuous control, such as a slider. A discrete property, such as font size or family, has a discrete control, such as a multi-position switch or a menu item. Each control also has a dedicated location on the display and is engaged using a direct point-select operation. Johnson et al. (1989, 14) compares direct manipu lation to driving a car. A gas pedal controls the speed, a lever controls the wiper blades, a knob controls the radio volume. Each control is a dedicated channel, each has a dedicated location, and each is operated according to the property it controls.

When operating a car, the driver can adjust the radio volume and then turn on the windshield wipers. Or the driver can #rst turn on the windshield wipers and then adjust the radio volume. The car is capable of responding to the driver’s inputs in any order, according to the driver’s wishes. In computing, direct manipulation brings the same "exibility. This is no small feat. Command-line interfaces, by com parison, are simple. They follow a software paradigm known as sequential pro gramming. Every action occurs in a sequence under the system’s control. When the system needs a speci#c input, the user is prompted to enter it. Direct manipu lation interfaces require a different approach because they must accept the user’s actions according to the user’s wishes. While manipulating hello in a text editor, for example, the user might change the font to Courier (hello) and then change the style to bold (hello). Or the user might #rst set the style to bold (hello) and then change the font to Courier (hello). The result is the same, but the order of actions differs. The point here is that the user is in control, not the system. To support this, direct manipulation systems are designed using a software paradigm known as event-driven programming, which is substantially more complicated than sequen tial programming. Although event-driven programming was not new (it was, and still is, used in process-control to respond to sensor events), designing systems that responded asynchronously to user events was new in the early 1970s when work began on the Star. Of course, from the user’s perspective, this detail is irrelevant (remember the invisible computer). We mention it here only to give credit to the Herculean effort that was invested in designing the Star and bringing it to market.

Designing the Star was not simply a matter of building an interface using win dows, icons, menus, and a pointing device (WIMP), it was about designing a sys tem on which these components could exist and work. A team at PARC led by Alan Kay developed such a system beginning around 1970. The central ingredients were a new object-oriented programming language known as Smalltalk and a software architecture known as Model-View-Controller. This was a complex programming environment that evolved in parallel with the design of the Star. It is not surprising, then, that the development of the Star spanned about 10 years, since the design ers were not only inventing a new style of human-computer interaction, they were inventing the architecture on which this new style was built.

In the end, the Star was not a commercial success. While many have speculated on why (e.g., D. K. Smith and Alexander, 1988), probably the most signi#cant rea son is that the Star was not a personal computer. In the article by the Star interface designers Johnson et al. (1989), there are numerous references to the Star as a per sonal computer. But it seems they had a different view of “personal.” They viewed the Star as a beefed-up version of a terminal connected to a central server, “a col lection of personal computers” (p. 12). In another article, designers Smith and Irby call the Star “a personal computer designed for of#ce professionals” (1998, 17). “Personal”? Maybe, but without a doubt the Star was, #rst and foremost, a networked workstation connected to a server and intended for an of#ce environment. And it was expensive: $16,000 for the workstation alone. That’s a distant world from personal computing as we know it today. It was also a distant world from personal computing as it existed in the late 1970s and early 1980s. Yes, even then personal computing was "ourishing. The Apple II, introduced in 1977 by Apple Computer, was hugely successful. It was the platform on which VisiCalc, the #rst spreadsheet application, was developed. VisiCalc eventually sold over 700,000 copies and became known as the #rst “killer app.” Notably, the Star did not have a spreadsheet application, nor could it run any spreadsheet or other application available in the market place. The Star architecture was “closed”—it could only run applications developed by Xerox.

Other popular personal computer systems available around the same time were the PET, VIC-20, and Commodore 64, all by Commodore Business Machines, and the TRS-80 by Tandy Corp. These systems were truly personal. Most of them were located in people’s homes. But the user interface was terrible. These systems worked with a traditional command-line interface. The operating system—if you could call it that—usually consisted of a BASIC-language interpreter and a con sole prompt. LOAD, SAVE, RUN, EDIT, and a few other commands were about the extent of it. Although these systems were indeed personal, a typical user was a hobbyist, computer enthusiast, or anyone with enough technical skill to connect components together and negotiate the inevitable software and hardware hiccups. But users loved them, and they were cheap. However, they were tricky to use. So while the direct manipulation user interface of the Star may have been intuitive and had the potential to be used by people with no technical skill (or interest in having it!), the system just didn’t reach the right audience.

1.6 Birth of HCI (1983)

Nineteen eighty-three is a good year to peg as the birth of HCI. There are at least three key events as markers: the #rst ACM SIGCHI conference, the publication of Card, Moran, and Newell’s The Psychology of Human-Computer Interaction (1983), and the arrival of the Apple Macintosh, pre-announced with "yers in December 1983. The Mac launch was in January 1984, but I’ll include it here anyway.

1.6.1 First ACM SIGCHI conference (1983)

Human-computer interaction’s roots reach as early as 1969, when ACM’s Special Interest Group on Social and Behavioral Computing (SIGSOC) was formed. (Borman, 1996). Initially, SIGSOC focused on computers in the social sciences. However, emphasis soon shifted to the needs and behavioral characteristics of the users, with talk about the user interface or the human factors of computing. Beginning in 1978, SIGSOC lobbied the ACM for a name change. This happened at the 1982 Conference on Human Factors in Computing Systems in Gaithersburg, Maryland, where the formation of the ACM Special Interest Group on Computer Human Interaction (SIGCHI) was #rst publicly announced. Today, the ACM pro vides the following articulate statement of SIGCHI and its mission:

The ACM Special Interest Group on Computer-Human Interaction is the world’s largest association of professionals who work in the research and practice of computer-human interaction. This interdisciplinary group is composed of computer scientists, software engineers, psychologists, interaction designers, graphic designers, sociologists, and anthropologists, just to name some of the domains whose special expertise come to bear in this area. They are brought together by a shared understanding that designing useful and usable technology is an interdisciplinary process, and believe that when done properly it has the power to transform persons’ lives. 7

The interdisciplinary nature of the #eld is clearly evident in the list of disci plines that contribute to, and have a stake in, HCI.

7 Retrieved from http://www.acm.org/sigs#026 on September 10, 2012.

FIGURE 1.9

Number of papers submitted and accepted by year for the ACM SIGCHI Conference on Human Factors in Computing Systems (“CHI”). Statistics from the ACM Digital Library.

In the following year, 1983, the #rst SIGCHI conference was held in Boston. Fifty-nine technical papers were presented. The conference adopted a slightly mod i#ed name to re"ect its new stature: ACM SIGCHI Conference on Human Factors in Computing Systems. “CHI,” as it is known (pronounced with a hard “k” sound), has been held yearly ever since and in recent years has had an attendance of about 2,500 people.

The CHI conference brings together both researchers and practitioners. The researchers are there for the technical program (presentation of papers), while the practitioners are there to learn about the latest themes of research in academia and industry. Actually, both groups are also there to network (meet and socialize) with like-minded HCI enthusiasts from around the world. Simply put, CHI is the event in HCI, and the yearly pilgrimage to attend is often the most important entry in the calendar for those who consider HCI their #eld.

The technical program is competitive. Research papers are peer reviewed, and acceptance requires rising above a relatively high bar for quality. Statistics com piled from 1982 to 2011 indicate a total of 12,671 paper submissions with 3,018 acceptances, for an overall acceptance rate of 24 percent. Figure 1.9 shows the breakdown by year, as provided on the ACM Digital Library website. 8 The techni cal program is growing rapidly. For example, the number of accepted contributions in 2011 (410) exceeded the number of submissions in 2005 (372).

Once accepted, researchers present their work at the conference, usually in a 15–20 minute talk augmented with visual slides and perhaps a video demonstration of the research. Acceptance also means the #nal submitted paper is published in the conference proceedings and archived in the ACM Digital Library. Some tips on writing and publishing a research paper are presented in Chapter 8.

8 Data retrieved from http://portal.acm.org. (Click on “Proceedings,” scroll down to any CHI confer ence proceedings, click on it, then click on the “Publication” tab.)

CHI papers have high visibility, meaning they reach a large community of researchers and practitioners in the #eld. One indication of the quality of the work is impact, the number of citations credited to a paper. Since the standards for acceptance are high, one might expect CHI papers to have high impact on the #eld of HCI. And indeed this is the case (MacKenzie, 2009a). I will say more about research impact in Chapter 4.

Although the annual CHI conference is SIGCHI’s "agship event, other confer ences are sponsored or co-sponsored by SIGCHI. These include the annual ACM Symposium on User Interface Software and Technology (UIST), specialized con ferences such as the ACM Symposium on Eye Tracking Research and Applications

(ETRA) and the ACM Conference on Computers and Accessibility (ASSETS), and regional conferences such as the Nordic Conference on Computer-Human Interaction (NordiCHI).

1.6.2 The psychology of human-computer interaction (1983)

If two HCI researchers speaking of “Card, Moran, and Newell” are overheard, there is a good chance they are talking about The Psychology of Human-Computer Interaction—the book published in 1983 and co-authored by Stuart Card, Tom Moran, and Allen Newell. (See Figure 1.10.) The book emerged from work done at Xerox PARC. Card and Moran arrived at PARC in 1974 and soon after joined PARC’s Applied Information-Processing Psychology Project (AIP). Newell, a pro fessor of computer science and cognitive psychology at Carnegie Mellon University in Pittsburgh, Pennsylvania, was a consultant to the project. The AIP mission was “to create an applied psychology of human-computer interaction by conducting requisite basic research within a context of application” (Card et al., 1983, p. ix).

The book contains 13 chapters organized roughly as follows: scienti#c foun dation (100 pages), text editing examples (150 pages), modeling (80 pages), and extensions and generalizations (100 pages). So what is an “applied psychology of human-computer interaction”? Applied psychology is built upon basic research in psychology. The #rst 100 or so pages in the book provide a comprehensive overview of core knowledge in basic psychology as it pertains to the human sensory, cogni tive, and motor systems. In the 1980s, many computer science students (and profes sionals) were challenged with building simple and intuitive interfaces for computer systems, particularly in view of emerging interaction styles based on a GUI. For many students, Card, Moran, and Newell’s book was their #rst formalized exposure to human perceptual input (e.g., the time to visually perceive a stimulus), cognition (e.g., the time to decide on the appropriate reaction), and motor output (e.g., the time to react and move the hand or cursor to a target). Of course, research in human sensory, cognitive, and motor behavior was well developed at the time. What Card, Moran, and Newell did was connect low-level human processes with the seemingly innocuous interactions humans have with computers (e.g., typing or using a mouse). The framework for this was the model human processor (MHP). (See Figure 1.11.) The MHP had an eye and an ear (for sensory input to a perceptual processor), a

FIGURE 1.10

Card, Moran, and Newell’s The Psychology of Human-Computer Interaction.

(Published by Erlbaum in 1983)

brain (with a cognitive processor, short-term memory, and long-term memory), and an arm, hand, and #nger (for motor responses).

The application selected to frame the analyses in the book was text editing. This might seem odd today, but it is important to remember that 1983 predates the World Wide Web and most of today’s computing environments such as mobile computing, touch-based input, virtual reality, texting, tweeting, and so on. Text editing seemed like the right framework in which to develop an applied psychology of human computer interaction. 9 Fortunately, all the issues pertinent to text editing are appli cable across a broad spectrum of human-computer interaction.

An interesting synergy between psychology and computer science—and it is well represented in the book—is the notion that human behavior can be under stood, even modeled, as an information processing activity. In the 1940s and 1950s the work of Shannon (1949), Huffman (1952), and others, on the transmission of information through electronic channels, was quickly picked up by psycholo gists like Miller (1956), Fitts (1954), and Welford (1968) as a way to character ize human perceptual, cognitive, and motor behavior. Card, Moran, and Newell a panel session at CHI 2008, Moran noted that the choice was between text editing and programming.

9 At

FIGURE 1.11

The model human processor (MHP) (Card et al., 1983, p. 26).

adapted information processing models of human behavior to interactive systems. The two most prominent examples in the book are Hick’s law for choice reaction time (Hick, 1952) and Fitts’ law for rapid aimed movement (Fitts, 1954). I will say more about these in Chapter 7, Modeling Interaction.

Newell later re"ected on the objectives of The Psychology of Human-Computer

Interaction:

We had in mind the need for a theory for designers of interfaces. The design of the interface is the leverage point in human-computer interaction. The clas sical emphasis of human factors and man-machine psychology on experimental

analysis requires that the system or a suitable mock-up be available for experi mentation, but by the time such a concrete system exists, most of the important degrees of freedom in the interface have been bound. What is needed are tools for thought for the designer—so at design time the properties and constraints of the user can be brought to bear in making the important choices. Our objective was to develop an engineering-style theory of the user that permitted approxi mate, back-of-the-envelope calculations of how the user would interact with the computer when operating at a terminal. (Newell, 1990, pp. 29–30)

There are some interesting points here. For one, Newell astutely identi#es a dilemma in the #eld: experimentation cannot be done until it is too late. As he put it, the system is built and the degrees of freedom are bound. This is an overstate ment, perhaps, but it is true that novel interactions in new products always seem to be followed by a "urry of research papers identifying weaknesses and suggest ing and evaluating improvements. There is more to the story, however. Consider the Apple iPhone’s two-#nger gestures, the Nintendo Wii’s acceleration sensing "icks, the Microsoft IntelliMouse’s scrolling wheel, or the Palm Pilot’s text-input gestures (aka Graf!ti). These “innovations” were not fresh ideas born out of engi neering or design brilliance. These breakthroughs, and many more, have context, and that context is the milieu of basic research in human-computer interaction and related #elds. 10 For the examples just cited, the research preceded commercializa tion. Research by its very nature requires dissemination through publication. It is not surprising, then, that conferences like CHI and books like The Psychology of Human-Computer Interaction are fertile ground for discovering and spawning new and exciting interaction techniques.

Newell also notes that an objective in the book was to generate “tools for thought.” This is a casual reference to models—models of interaction. The mod els may be quantitative and predictive or qualitative and descriptive. Either way, they are tools, the carver’s knife, the cobbler’s needle. Whether generating quanti tative predictions across alternative design choices or delimiting a problem space to reveal new relationships, a model’s purpose is to tease out strengths and weak nesses in a hypothetical design and to elicit opportunities to improve the design. The book includes exemplars, such as the keystroke-level model (KLM) and the goals, operators, methods, and selection rules model (GOMS). Both of these mod els were presented in earlier work (Card, Moran, and Newell, 1980), but were pre sented again in the book, with additional discussion and analysis. The book’s main contribution on modeling, however, was to convincingly demonstrate why and how models are important and to teach us how to build them. For this, HCI’s debt to 10 Of the four examples cited, research papers anticipating each are found in the HCI literature. On multi-touch #nger gestures, there is Rekimoto’s “pick-and-drop” (1997), Dietz and Leigh’s DiamondTouch (Dietz and Leigh, 2001), or, much earlier, Herot and Weinzapfel’s two-#nger rotation gesture on a touchscreen (Herot and Weinzapfel, 1978). On acceleration sensing, there is Harrison et al.’s “tilt me!” (1998). On the wheel mouse, there is Venolia’s “roller mouse” (Venolia, 1993). On single-stroke handwriting, there is Goldberg and Richardson’s “Unistrokes” (1993).

Card, Moran, and Newell is considerable. I will discuss descriptive and predictive models further in Chapter 7, Modeling Interaction.

Newell suggests using approximate “back of the envelope” calculations as a convenient way to describe or predict user interaction. In The Psychology of Human-Computer Interaction, these appear, among other ways, through a series of 19 interaction examples in Chapter 2 (pp. 23–97). The examples are presented as questions about a user interaction. The solutions use rough calculations but are based on data and concepts gleaned from basic research in experimental psychol ogy. Example 10 is typical:

A user is presented with two symbols, one at a time. If the second symbol is identi cal to the !rst, he is to push the key labeled YES. Otherwise he is to push NO. What is the time between signal and response for the YES case? (Card et al., 1983, p. 66)

Before giving the solution, let us consider a modern context for the example. Suppose a user is texting a friend and is entering the word hello on a mobile phone using predictive text entry (T9). Since the mobile phone keypad is ambiguous for text entry, the correct word does not always appear. After entering 4(GHI), 3(DEF), 5(JKL), 5(JKL), 6(MNO), a word appears on the display. This is the signal in the example (see above). There are two possible responses. If the word is hello, it matches the word in the user’s mind and the user presses 0(Space) to accept the word and append a space. This is the yes response in the example. If the display shows some other word, a collision has occurred, meaning there are multiple can didates for the key sequence. The user presses *(Next) to display the next word in the ambiguous set. This is the no response in the example. As elaborated by Card, Moran, and Newell, the interaction just described is a type of simple decision known as physical matching. The reader is walked through the solution using the model human processor to illustrate each step, from stimulus to cognitive process ing to motor response. The solution is approximate. There is a nominal prediction accompanied by a fastman prediction and a slowman prediction. Here’s the solution:

Reaction time

t p

2 t c

t M

2 ( 70 [ 25 ∼ 170 ]) 310 [ 130 ∼ 640 ] ms

100 [ 30 ∼ 200 ]

70 [ 30 ∼ 100 ]

(1)

(Card et al., 1983, p. 69). There are four low-level processing cycles: a perceptual processor cycle (tP), two cognitive processor cycles (tC), and a motor processor cycle (tM). For each, the nominal value is bracketed by an expected minimum and maximum. The values in Equation 1 are obtained from basic research in experi mental psychology, as cited in the book. The fastman–slowman range is large and demonstrates the dif#culty in accurately predicting human behavior. The book has many other examples like this. There are also modern contexts for the examples, just waiting to be found and applied.

It might not be apparent that predicting the time for a task that takes only one third of a second is relevant to the bigger picture of designing interactive systems.

But don’t be fooled. If a complex task can be deconstructed into primitive actions, there is a good chance the time to do the task can be predicted by dividing the task into a series of motor actions interlaced with perceptual and cognitive processing cycles. This idea is presented in Card, Moran, and Newell’s book as a keystroke level model (KLM), which I will address again in Chapter 7.

The Psychology of Human-Computer Interaction is still available (see http:// www.amazon.com) and is regularly and highly cited in research papers (5,000+ citations according to Google Scholar). At the ACM SIGCHI conference in Florence, Italy in 2008, there was a panel session celebrating the book’s 25th anni versary. Both Card and Moran spoke on the book’s history and on the challenges they faced in bringing a psychological science to the design of interactive com puting systems. Others spoke on how the book affected and in"uenced their own research in human-computer interaction.

1.6.3 Launch of the Apple Macintosh (1984)

January 22, 1984 was a big day in sports. It was the day of Super Bowl XVIII, the championship game of the National Football League in the United States. It was also a big day in advertising. With a television audience of millions, com panies were jockeying (and paying!) to deliver brief jolts of hype to viewers who were hungry for entertainment and primed to purchase the latest must-have prod ucts. One ad—played during the third quarter—was a 60-second stint for the Apple Macintosh (the Mac) personal computer. The ad, which is viewable on YouTube, used Orwell’s Nineteen Eighty-Four as a theme, portraying the Mac as a computer that would shatter the conventional image of the home computer.11 The ad climaxed with a female athlete running toward, and tossing a sledgehammer through, the face of Big Brother. The disintegration of Big Brother signaled the triumph of the human spirit over the tyranny and oppression of the corporation. Directed by Ridley Scott,12 the ad was a hit and was even named the 1980s Commercial of the Decade by Advertising Age magazine. 13 It never aired again.

The ad worked. Soon afterward, computer enthusiasts scooped up the Mac. It was sleek and sported the latest input device, a computer mouse. (See Figure 1.12.) The operating system and applications software heralded the new age of the GUI with direct manipulation and point-select interaction. The Mac was not only cool, the interface was simple and intuitive. Anyone could use it. Part of the simplicity was its one-button mouse. With one button, there was no confusion on which button to press.

There are plenty of sources chronicling the history of Apple and the events leading to the release of the Mac (Levy, 1995; Linzmayer, 2004; Moritz, 1984). Unfortunately, along with the larger-than-life stature of Apple and its "amboyant 11 Search using “1984 Apple Macintosh commercial.” 12 Known for his striking visual style, Scott directed many off-beat feature-length #lms such as Alien

(1979), Blade Runner (1982), Thelma and Louise (1991), and Gladiator (2000). 13 http://en.wikipedia.org/wiki/1984 (advertisement).

1.7 Growth of HCI and graphical user interfaces (GUIs)

FIGURE 1.12

leaders comes plenty of folklore to untangle. A few notable events are listed in Figure 1.13. Names of the key players are deliberately omitted.

1.7 Growth of HCI and graphical user interfaces (GUIs)

With the formation of ACM SIGCHI in 1983 and the release and success of the Apple Macintosh in 1984, human-computer interaction was off and running. GUIs entered the mainstream and, consequently, a much broader community of users and research ers were exposed to this new genre of interaction. Microsoft was a latecomer in GUIs. Early versions of Microsoft Windows appeared in 1985, but it was not until the release of Windows 3.0 (1990) and in particular Windows 3.1 (1992) that Microsoft Windows was considered a serious alternative to the Macintosh operating system. Microsoft increased its market share with improved versions of Windows, most nota bly Windows 95 (1995), Windows 98 (1998), Windows XP (2001), and Windows 7 (2009). Today, Microsoft operating systems for desktop computers have a market share of about 84 percent, compared to 15 percent for Apple. 14

With advancing interest in human-computer interaction, all major universi ties introduced courses in HCI or user interface (UI) design, with graduate stu dents often choosing a topic in HCI for their thesis research. Many such programs of study were in computer science departments; however, HCI also emerged as a legitimate and popular focus in other areas such as psychology, cognitive science, industrial engineering, information systems, and sociology. And it wasn’t just uni versities that recognized the importance of the emerging #eld. Companies soon

14 www.statowl.com.

FIGURE 1.13

Some notable events leading to the release of the Apple Macintosh. 15

realized that designing good user interfaces was good business. But it wasn’t easy. Stories of bad UIs are legion in HCI (e.g., Cooper, 1999; Johnson, 2007; Norman, 1988). So there was work to be done. Practitioners—that is, specialists applying HCI principles in industry—are important members of the HCI community, and they form a signi#cant contingent at many HCI conferences today.

1.8 Growth of HCI research

Research interest in human-computer interaction, at least initially, was in the qual ity, effectiveness, and ef#ciency of the interface. How quickly and accurately can people do common tasks using a GUI versus a text-based command-line interface? Or, given two or more variations in a GUI implementation, which one is quicker or more accurate? These or similar questions formed the basis of much empirical research in the early days of HCI. The same is still true today.

A classic example of a research topic in HCI is the design of menus. With a GUI, the user issues a command to the computer by selecting the command from a menu rather than typing it on the keyboard. Menus require recognition; typing

15 www.theapplemuseum.com,

http://en.wikipedia.org/wiki/History_of_Apple, and www.guidebook

gallery.org/articles/lisainterview, with various other sources to con#rm dates and events.

1.8 Growth of HCI research

FIGURE 1.14

Breadth versus depth in menu design: (a) 8×8 choices in a broad hierarchy. (b) 2×2×2×2×2×2 choices in a deep hierarchy.

requires recall. It is known that recognition is preferred over recall in user inter faces (Bailey, 1996, p. 144; Hodgson and Ruth, 1985; Howes and Payne, 1990), at least for novices, but a new problem then surfaces. If there are numerous com mands in a menu, how should they be organized? One approach is to organize menu commands in a hierarchy that includes depth and breadth. The question arises: what is the best structure for the hierarchy? Consider the case of 64 commands organized in a menu. The menu could be organized with depth = 8 and breadth = 2, or with depth = 2 and breadth = 6. Both structures provide access to 64 menu items. The breadth-emphasis case gives 82 = 64 choices (Figure 1.14a). The depth-emphasis case gives 26 = 64 choices (Figure 1.14b). Which organization is better? Is another organization better still (e.g., 43 = 64)? Given these questions, it is not surprising that menu design issues were actively pursued as research topics in the early days of HCI (e.g., Card, 1982; Kiger, 1984; Landauer and Nachbar, 1985; D. P. Miller, 1981; Snowberry, Parkinson, and Sisson, 1983; Tullis, 1985).

Depth versus breadth is not the only research issue in menu design; there are many others. Should items be ordered alphabetically or by function (Card, 1982; Mehlenbacher, Duffy, and Palmer, 1989)? Does the presence of a title on a sub menu improve menu access (J. Gray, 1986)? Is access improved if an icon is added to the label (Hemenway, 1982)? Do people in different age groups respond dif ferently to broad versus deep menu hierarchies (Zaphiris, Kurniawan, and Ellis, 2003)? Is there a depth versus breadth advantage for menus on mobile devices (Geven, Sefelin, and Tschelig, 2006)? Does auditory feedback improve menu access (Zhao, Dragicevic, Chignell, Balakrishnan, and Baudisch, 2007)? Can the tilt of a mobile phone be used for menu navigation (Rekimoto, 1996)? Can menu lists be pie shaped, rather than linear (Callahan, Hopkins, Weiser, and Shneiderman, 1988)? Can pie menus be used for text entry (D. Venolia and Neiberg, 1994)?

The answers to these research questions can be found in the papers cited. They are examples of the kinds of research questions that create opportunities for empiri cal research in HCI. There are countless such topics of research in HCI. While we’ve seen many in this chapter, we will #nd many more in the chapters to come.

1.9 Other readings

Two other papers considered important in the history of HCI are: ● “Personal Dynamic Media” by A. Kay and A. Goldberg (1977). This article

describes Dynabook. Although never built, Dynabook provided the conceptual basis for laptop computers, tablet PCs, and e-books.

● “The Computer for the 21st Century” by M. Weiser (1991). This is the essay

that presaged ubiquitous computing. Weiser begins, “The most profound tech nologies are those that disappear. They weave themselves into the fabric of eve ryday life until they are indistinguishable from it” (p. 94).

Other sources taking a historical view of human-computer interaction include: Baecker, Grudin, Buxton, and Greenberg, 1995; Erickson and McDonald, 2007; Grudin, 2012; Myers, 1998.

1.10 Resources

The following online resources are useful for conducting research in human-com puter interaction:

● Google Scholar: ● ACM Digital Library: ● HCI Bibliography:

http://scholar.google.ca

http://portal.acm.org

http://hcibib.org

This website is available as a resource accompanying this book: ● www.yorku.ca/mack/HCIbook

Many downloads are available to accompany the examples presented herein.

STUDENT EXERCISES

1-1. The characteristics of direct manipulation include visibility of objects, incre mental action, rapid feedback, reversibility, exploration, syntactic correctness of all actions, and replacing language with action. For each characteristic consider and discuss an example task performed with modern GUIs. Contrast the task with the same task as performed in a command-line environment such as unix, linux, or DOS.

~~~

CHAPTER 3

~

~~~~~

~

~~

Cognitive Engineering

DONALD A. NORMAN

PROLOGUE

cognitive Engineering, a term invented to reflect the enterprise I find myself engaged in: neither Cognitive Psychology, nor Cognitive Sci ence, nor Human Factors. It is a type of applied Cognitive Science, try ing to apply what is known from science to the design and construction It is a surprising business. On the one hand, there actu of machines. ally is quite a lot known in Cognitive Science that can be applied. But on the other hand, our lack of knowledge is appalling. On the one hand, computers are ridiculously difficult to use. On the other hand, many devices are difficult to use-the problem is not restricted to com puters, there are fundamental difficulties in understanding and using So the goal of Cognitive Engineering is to come most complex devices. to understand the issues, to show how to make better choices when they exist, and to show what the tradeoffs are when, as is the usual case, an improvement in one domain leads to deficits in another. In this chapter I address some of the problems of applications that have been of primary concern to me over the past few years and that have guided the selection of contributors and themes of this book. The chapter is not intended to be a coherent discourse on Cognitive Engineering. Instead, I discuss a few issues that seem central to the

32 DONALD A. NORMAN

way that people interact with machines.

are the critical phenomena: two major goals:

The goal is to determine what

The details can come later. Overall, I have

  1. To understand the fundamental principles behind human action and performance that are relevant for the development of engineering principles of design.
  2. To devise systems that are pleasant to use-the goal is neither efficiency nor ease nor power, although these are all to be desired, but rather systems that are pleasant, even fun: to pro- duce what Laurel calls “pleasurable engagement” (Chapter 4).

AN ANALYSIS OF TASK COMPLEXITY

Start with an elementary example: how a person performs a simple task, Suppose there are two variables to be controlled. How should we build a device to control these variables? The control question seems trivial: If there are two variables to be controlled, why not simply have two controls, one for each? What is the problem? It turns out that there is more to be considered than is obvious at first thought. Even the task of controlling a single variable by means of a single control mechanism raises a score of interesting issues. One has only to watch a novice sailor attempt to steer a small boat to a compass course to appreciate how difficult it can be to use a single control mechanism (the tiller) to affect a single outcome (boat direc tion). The mapping from tiller motion to boat direction is the opposite of what novice sailors sometimes expect. And the mapping of compass movement to boat movement is similarly confusing. If the sailor attempts to control the boat by examining the compass, determining in which direction to move the boat, and only then moving the tiller, the task can be extremely difficult.

kperienced sailors will point out that thisf ormulation puts the problem in its clumsiest, most dijjjcult form: With the right formulation, or the right conceptual model, the task is not complex. That comment makes two points. First, the descrip tion I gave is a reasonable one for many novice sailors: The

task is quite dijjjcultf or them. simpler ways of viewing the task, but that even a task that has but a single mechanism to control a single variable can be dgfJicult to understand, to learn, and to do. Second, the com ment reveals the power of the proper conceptual model of the

The point is not that there are

3. COGNITIVE ENGINEERING

situation: The correct conceptual model can transform confis ing, dljjkult tasks into simple, straightforward ones. This is an important point thatf orms the theme of a later section.

Psychological Variables Differ From Physical Variables

33

There is a discrepancy between the person’s psychologically expressed goals and the physical controls and variables of the task. The person starts with goals and intentions. These are psychological variables. They exist in the mind of the person and they relate directly to the needs and However, the task is to be performed on aphy concerns of the person. sical system, with physical mechanisms to be manipulated, resulting in changes to the physical variables and system state. Thus, the person must interpret the physical variables into terms relevant to the psycho logical goals and must translate the psychological intentions into physi cal actions upon the mechanisms. This means that there must be a stage of interpretation that relates physical and psychological variables, as well as functions that relate the manipulation of the physical vari ables to the resulting change in physical state. In many situations the variables that can easily be controlled are not those that the person cares about. Consider the example of bathtub water control. The person wants to control rate of total water flow and temperature. But water arrives through two pipes: hot and cold. The easiest system to build has two faucets and two spouts. As a result, the physical mechanisms control rate of hot water and rate of cold water. Thus, the variables of interest to the user interact with the two physical variables: Rate of total flow is the sum of the two physical variables; temperature is a function of their difference (or ratio). The problems come from several sources:

1. Mapping problems. way should each control be turned to increase or decrease the flow? (Despite the appearance of universal standards for these mappings, there are sufficient variations in the standards, idiosyncratic layouts, and violations of expectations, that each new faucet poses potential problems.)

2. Ease of control.

Which control is hot, which is cold? Which

To make the water hotter while maintaining total rate constant requires simultaneous manipulation of both faucets.

3, Evaluation. With two spouts, i t is sometimes difficult to deter mine if the correct outcome has been reached.

34 DONALD A. NORMAN

Faucet technology evolved to solve the problem. First, mixing spouts were devised that aided the evaluation problem. Then, "single control" faucets were devised that varied the psychological factors directly: One dimension of movement of the control affects rate of flow, another orthogonal dimension affects temperature. These con trols are clearly superior to use. They still do have a mapping problem-knowing what kind of movement to which part of the mechanism controls which variable-and because the mechanism is no longer as visible as in the two-faucet case, they are not quite so easy to understand for the first-time user. Still, faucet design can be used as a positive example of how technology has responded to provide control over the variables of psychological interest rather than over the physical variables that are easier and more obvious.

It is surprisingly easy to find other examples of the two-variable

two-control task. The water faucets is one example. The loudness and balance controls on some audio sets is another. The temperature con trols of some refrigerator-freezer units is another. Let me examine this latter example, for it illustrates a few more issues that need to be considered, including the invisibility of the control mechanisms and a long time delay between adjustment of the control and the resulting change of temperature.

There are two variables of concern to the user: the temperature of the freezer compartment and the temperature of the regular "fresh

3. COGNITIVE ENGINEERING

35

At first, this seems just like the water control example, but there is a difference. Consider the refrigerator that I own. It has two compartments, a freezer and a fresh foods one, and two con trols, both located in the fresh foods section. One control is labeled "freezer," the other "fresh food," and there is an associated instruction plate (see the illustration). But what does each control do? What is the mapping between their settings and my goal? The labels seem clear enough, but if you read the "instructions" confusion can rapidly set in. Experience suggests that the action is not as labeled: The two controls interact with one another. The problems introduced by this example seem to exist at almost every level:

food" compartment.

  1. Matching the psychological variables of interest to the physical variables being controlled.
  2. The mapping relationships. There is clearly strong interaction between the two controls, making simple mapping between control and function or control and outcome difficult.
  3. Feedback. Very slow, so that by the time one is able to deter- mine the result of an action, so much time has passed that the action is no longer remembered, making "correction" of the action difficult.
  4. Conceptual model. None. The instructions seem deliberately

I suspect that this problem results from the way this refrigerator's cooling mechanism is constructed. The two vari ables o fpsychological interest cannot be controlled directly. Instead, there is only one cooling mechanism and one ther mostat, which therefore, must be located in either the "fiesh food' section or in the freezer, but not both. A good descrip tion of this mechanism, stating which control affected which function would probably make matters workable. If one mechanism were clearly shown to control the thermostat and the other to control the relative proportion o fcold air directed toward the freezer and fresh foods section, the task would be

36 DONALD A. NORMAN

The user would be able to get a clear concep tual model of the operation. Without a conceptual model, with a 24-hour delay between setting the controls and deter mining the results, it is almost impossible to determine how to operate the controls.

much easier.

Two variables:

could believe that it would be so dijJcult?

two controls.

Who

Even Simple Tasks Involve a Large Number of Aspects

So we have the psychological goals and intentions (G and I ) and the physical state, mechanisms, and variables ( S , M , and V ) . First, the person must examine the current system state, S , and evalu ate it with respect to the goals, G . This requires translating the physi cal state of the system into a form consistent with the psychological goal. Thus, in the case of steering a boat, the goal is to reach some tar get, but the physical state is the numerical compass heading. In writing a paper, the goal may be a particular appearance of the manuscript, but the physical state may be the presence of formatting commands in the midst of the text. The difference between desired goal and current state gives rise to an intention, again stated in psychological terms. This must get translated into an action sequence, the specification of what physical acts will be performed upon the mechanisms of the sys tem. To go from intention to action specification requires consideration of the mapping between physical mechanisms and system state, and between system state and the resulting psychological interpretation. There may not be a simple mapping between the mechanisms and the resulting physical variables, nor between the physical variables and the resulting psychological states. Thus, each physical variable might be

Vl = f (M1,M 2) affected by an interaction of the control mechanisms: and V 2 = g (MI, M 2 ) . In turn, the system state, S is a function of all its variables: S = h ( V l , V 2 ) . And finally, the mapping between sys tem state and psychological interpretation is complex. All in all, the two variable-two mechanism situation can involve a surprising number

The conclusion to draw from these examples is that even with two vari ables, the number of aspects that must be considered is surprisingly large. Thus, suppose the person has two psychological goals, G1 and G2. These give rise to two intentions, Zl and Z2, to satisfy the goals. The system has some physical state, S , realized through the values of its variables: For convenience, let there be two variables of interest, Vl and V2. And let there be two mechanisms that control the system, M I and M 2 .

of aspects.

The list of aspects is shown and defined in Table 3.1.

3. COGNITIVE ENGINEERING

TABLE 3.1

37

Aspect

Description

Goals and intentions.

A goal is the state the person wishes to achieve; an in tention is the decision to act so as to achieve the goal.

Specification of the action se quence.

The psychological process of determining the psycho logical representation of the actions that are to be exe

cuted by the user on the mechanisms of the system.

Mapping from psychological goals and intentions to action

sequence.

In order to specify the action sequence, the user must translate the psychological goals and intentions into the desired system state, then determine what settings of the control mechanisms will yield that state, and then determine what physical manipulations of the mechan isms are required. The result is the internal, mental specification of the actions that are to be executed.

Physical state of the system.

The physical state of the system, determined by the values of all its physical variables.

Control mechanisms.

The physical devices that control the physical variables.

Mapping between the physical mechanisms and system state.

The relationship between the settings of the mechan isms of the system and the system state.

Interpretation of system state.

The relationship between the physical state of the sys tem and the psychological goals of the user can only be determined by first translating the physical state into psychological states (perception), then interpreting the perceived system state in terms of the psychological variables of interest.

Evaluating the outcome.

Evaluation of the system state requires comparing the interpretation of the perceived system state with the desired goals. This often leads to a new set of goals and intentions.

TOWARD A THEORY OF ACTION

It seems clear that we need to develop theoretical tools to understand what the user is doing. We need to know more about how people actu ally do things, which means a theory of action. There isn’t any realistic hope of getting the theory of action, at least for a long time, but

38 DONALD A. NORMAN

certainly we should be able to develop approximate theories.’ And that is what follows: an approximate theory for action which distinguishes among different stages of activities, not necessarily always used nor applied in that order, but different kinds of activities that appear to cap ture the critical aspects of doing things.

The stages have proved to be useful in analyzing systems and in guiding design. The essential com ponents of the theory have already been introduced in Table 3.1.

In the theory of action to be considered here, a person interacts with a system, in this case a computer. Recall that the person’s goals are expressed in terms relevant to the person-in psychological terms-and the system’s mechanisms and states are expressed in terms relative to it-in physical terms. The discrepancy between psychological and phy sical variables creates the major issues that must be addressed in the design, analysis, and use of systems. I represent the discrepancies as two gulfs that must be bridged: the Gulf of Execution and the Gulf o f Evuluurion, both shown in Figure 3.1. 2

The Gulfs of Execution and Evaluation

The user of the system starts off with goals expressed in psychological terms. The system, however, presents its current state in physical terms. Goals and system state differ significantly in form and content, creating the Gulfs that need to be bridged if the system can be used (Figure 3.1). The Gulfs can be bridged by starting in either direction. The designer can bridge the Gulfs by starting at the system side and moving closer to the person by constructing the input and output characteristics of the interface so as to make better matches to the 1 There is little prior work in psychology that can act as a guide. Some of the principles come from the study of servomechanisms and cybernetics. The first study known to me in psychology-and in many ways still the most important analysis-is the book Plans and the Sfructure o fBehuvior by Miller, Galanter, and Pribram (1960) early in the history of information processing psychology. Powers (1973) applied concepts from control theory

In the work most relevant to the study of Human-Computer In to cognitive concerns.

teraction, Card, Moran, and Newell (19831, analyzed the cycle of activities from Goal through Selection: the GOMS model ( Goal, Operator, Merhods, Selection). Their work is closely related to the approach given here. This is an issue that has concerned me for some time, so some of my own work is relevant: the analysis of errors, of typing, and of the attentional control of actions (Norman, 1981a, 1984b, 1986; Norman & Shallice, 1985; Rumelhart & Norman, 1982).

2 The emphasis on the the discrepancy between the user and the system, and the suggestion that we should conceive of the discrepancy as a Gulf that must be bridged by the user and the system designer, came from Jim Hollan and Ed Hutchins during one of the many revisions of the Direct Manipulation chapter (Chapter 5 ) .

3. COGNITIVE ENGINEERING

39

FIGURE . 3.1

The Gulfs of Execution and Evaluation. Each Gulf is unidirectional: The

psychological needs of the user. The user can bridge the Gulfs by creating plans, action sequences, and interpretations that move the nor mal description of the goals and intentions closer to the description required by the physical system (Figure 3.2).

Bridging the Gulf of Execution.

The gap from goals to physical sys

tem is bridged in four segments: intention formation, specifying the action sequence, executing the action, and, finally, making contact with the input mechanisms of the interface. The intention is the first step, and it starts to bridge the gulf, in part because the interaction language demanded by the physical system comes to color the thoughts of the person, a point expanded upon in Chapter 5 by Hutchins, Hollan, and Norman. Specifying the action sequence is a nontrivial exercise in planning (see Riley & O’Malley, 1985). It is what Moran calls match ing the internal specification to the external (Moran, 1983). In the terms of the aspects listed in Table 3.1, specifying the action requires translating the psychological goals of the intention into the changes to be made to the physical variables actually under control of the system. This, in turn, requires following the mapping between the psychological intentions and the physical actions permitted on the mechanisms of the system, as well as the mapping between the physical mechanisms and the resulting physical state variables, and between the physical state of the system and the psychological goals and intentions. After an appropriate action sequence is determined, the actions must be executed. Execution is the first physical action in this sequence: Forming the goals and intentions and specifying the action sequence were all mental events, Execution of an action means to do something, whether it is just to say something or to perform a complex motor

40 DONALD A. NORMAN

EXECUTION BRIDGE

FIGURE . 3.2

sequence. Just what physical actions are required is determined by the choice of input devices on the system, and this can make a major difference in the usability of the system. Because some physical actions are more difficult than others, the choice of input devices can affect the selection of actions, which in turn affects how well the system matches with intentions. On the whole, theorists in this business tend to ignore the input devices, but in fact, the choice of input device can often make an important impact on the usability of a system.

(See Chapter

15 by Buxton for a discussion of this frequently overlooked point.)

Bridging the Gulf of Evaluation. Evaluation requires comparing the interpretation of system state with the original goals and intentions. One problem is to determine what the system state is, a task that can be assisted by appropriate output displays by the system itself. The out comes are likely to be expressed in terms of physical variables that bear complex relationships to the psychological variables of concern to the user and in which the intentions were formulated. The gap from sys tem to user is bridged in four segments: starting with the output

3. COGNITIVE ENGINEERING

41

displays of the interface, moving to the perceptual processing of those displays, to its interpretation, and finally, to the evaluation-the com parison of the interpretation of system state with the original goals and intention. But in doing all this, there is one more problem, one just beginning to be understood, and one not assisted by the usual forms of displays: the problem of level. There may be many levels of outcomes that must be matched with different levels of intentions (see Norman, 1981a; Rasmussen in press; Rasmussen & Lind, 1981). And, finally, if the change in system state does not occur immediately following the execution of the action sequence, the resulting delay can severely impede the process of evaluation, for the user may no longer remember the details of the intentions or the action sequence.

Stages of User Activities

A convenient summary of the analysis of tasks is is that the process of performing and evaluating an action can be approximated by seven stages of user activity’ (Figure 3.3):

0 Establishing the Goal

Forming the Intention

0 Specifying the Action Sequence 0 Executing the Action 0 Perceiving the System State 0 Interpreting the State 0 Evaluating the System State with respect to the Goals

and Intentions

3 The last two times I spoke of an approximate theory of action (Norman, 1984a. 1985) I spoke of four stages. Now I speak of seven. An explanation seems to be in order. The answer really is simple. The full theory of action is not yet in existence, but whatev er its form, i t involves a continuum of stages on both the action/execution side and the perception/evaluation side. The notion of stages is a simplification of the underlying theory: I do not believe that there really are clean, separable stages. However, for prac tical application, approximating the activity into stages seems reasonable and useful. Just what division of stages should be made, however, seems less clear. In my original for mulations, I suggested four stages: intention, action sequence, execution, and evaluation. In this chapter I separated goals and intentions and expanded the analysis of evaluation by adding perception and interpretation, thus making the stages of evaluation correspond

Perception is the evaluatory equivalent of execution, better with the stages of execution:

interpretation the equivalent of the action sequence, and evaluation the equivalent of forming the intention.

The present formulation seems a richer, more satisfactory analysis.

42 DONALD A. NORMAN

Seven stages of user activities involved in the performance of a task. The primary, central stage is the establishment of the goal. Then, to carry out an action re quires three stages: forming the intention, specifying the action sequence, and executing the action. To assess the effect of the action also requires three stages, each in some sense complementary to the three stages of carrying out the action: perceiving the system state, interpreting the state, and evaluating the interpreted state with respect to the origi nal goals and intentions.

FIGURE . 3.3

Real activity does not progress as a simple sequence of stages. Stages appear out of order, some may be skipped, some repeated. Even the analysis of relatively simple tasks demonstrates the complexities. Moreover, in some situations, the person is reactive-event or data driven-responding to events, as opposed to starting with goals and intentions. Consider the task of monitoring a complex, ongoing opera tion. The person’s task is to respond to observations about the state of the system. Thus, when an indicator starts to move a bit out of range, or when something goes wrong and an alarm is triggered, the operator

3. COGNITIVE ENGINEERING

43

must diagnose the situation and respond appropriately. The diagnosis leads to the formation of goals and intentions:

Evaluation includes not only checking on whether the intended actions were executed properly and intentions satisfied, but whether the original diagnosis was appropriate. Thus, although the stage analysis is relevant, it must be used in ways appropriate to the situation.

Consider the example of someone who has written a letter on a computer word-processing system. The overall goal is to convey a mes sage to the intended recipient. Along the way, the person prints a draft of the letter.

Suppose the person decides that the draft, shown in Fig ure 3.4A, doesn't look right: The person, therefore, establishes the

Call this first inten intention "Improve the appearance of the letter." tion intention I .

Note that this intention gives little hint of how the task is to be accomplished. As a result, some problem solving is required, perhaps ending with intention2: "Change the indented paragraphs to block paragraphs." To do this requires intention3: "Change the occurrences of .pp in the source code for the letter to .sp." This in turn requires the person to generate an action sequence appropriate for the text editor, and then, finally, to execute the actions on the computer keyboard. Now, to evaluate the results of the operation requires still further operations, including generation of a foulth intention, inten [ion4: "Format the file" (in order to see whether intention2 and inten

The entire sequence of stages is shown in Figure tion 1 were satisfied).

3.4B. The final product, the reformatted letter, is shown in Figure 3.4C. Even intentions that appear to be quite simple ( e.g., intention,: "Approve the appearance of the lettef) lead to numerous subinten tions. The intermediary stages may require generating some new subin tentions.

Practical Implications

The existence of the two gulfs points out a critical requirement for the design of the interface: to bridge the gap between goals and system. Moreover, as we have seen, there are only two ways to do this: move the system closer to the user; move the user closer to the system. Moving from the system to the user means providing an interface that matches the user's needs, in a form that can be readily interpreted and manipulated. This confronts the designer with a large number of issues. Not only do users differ in their knowledge, skills, and needs, but for even a single user the requirements for one stage of activity can conflict with the requirements for another. Thus, menus can be thought of as information to assist in the stages of intention formation and action specification, but they frequently make execution more

44 DONALD A. NORMAN

_ _

The sequence of stages necessary to make the appropriate changes to the source file of the ate the manuscript, OutcOme then against to the get a several printed, levels formatted of intentions. COPY Of (c) the letter, he final and product, finally,t the o eva1u- refor -

FIGURE 3.4, sequencoef stages in a typical task. (A) The starting Point. The letter doesn3t4 4100kr ight,**s o the initial intention is "improve the appearance of the letter." (B)

matted letter.

3. COGNITIVE ENGINEERING

45

difficult. The attempt to aid evaluation by presenting extra information can impair intention selection, in part by providing distractions.

On the other hand, failure to provide information can make life more complex for the user, making it harder to get the job done and adding to the frustrations with the system if the user is left bewildered, not knowing what options are available or what is happening.

Many systems can be characterized by how well they support the dif ferent stages. The argument over whether action specification should be done by command language or by pointing at menu options or icons turns out to be an argument over the relative merits of support for the

stages of Execution and Action Specification.

Visual presence can aid the various stages of activity. Thus, we give support to the generation of intentions by reminding the user of what is possible. We support action selection because the visible items act as a direct translation into possible actions. We aid execution, especially if execution by pointing (throwing switches) is possible. And we aid evaluation by making it possible to provide visual reminders of what was done.

Visual structure can aid in the interpretation. Thus, for some purposes, graphs, pictures, and moving images will be superior to words: In other situations words will be superior.

Moving from psychological variables to physical variables can take effort. The user must translate goals conceived in psychological terms to actions suitable for the system. Then, when the system responds, the user must interpret the output, translating the physical display of the interface back into psychological terms. The major responsibility should rest with the system designer to assist the user in understanding the system. This means providing a good, coherent design model and a consistent, relevant system image.

CONCEPTUAL MODELS AND THE SYSTEM IMAGE

There are two sides to the interface: the system side and the human side. The stages of execution and perception mediate between psycho logical and physical representations. And the input mechanism and output displays of the system mediate between the psychological and physical representations. We change the interface at the system side through proper design. We change the interface at the human side through training and experience. In the ideal case, no psychological effort is required to bridge the gulfs. But such a situation occurs only either with simple situations or with experienced, expert users. With complex tasks or with nonexpert users, the user must engage in a plan ning process to go from intentions to action sequence. This planning process, oftentimes involving active problem solving, is aided when the

46 DONALD A. NORMAN

person has a good conceptual understanding of the physical system, an argument developed more fully by Riley in Chapter 7. Think of a conceptual model of the system as providing a scaffold ing upon which to build the bridges across the gulfs. The scaffoldings provided by these conceptual models are probably only important dur ing learning and trouble-shooting. But for these situations they are essential. Expert users can usually do without them. They allow the user to derive possible courses of action and possible system responses. The problem is to design the system so that, first, it follows a con sistent, coherent conceptualization-a design model-and, second, so that the user can develop a mental model of that system-a user model-consistent with the design model.

Mental models seem a pervasive property of humans. believe that peoplef orm internal, mental models of themselves and of the things and people with whom they interact. These models provide predictive and explanatory power for under standing the interaction. Mental models evolve naturally through interaction with the world and with the particular sys tem under consideration (see Owen's description in Chapter 9 and the discussion by Riley, Chapter 7). These models are highly affected by the nature of the interaction, coupled with the person's prior knowledge and understanding. The models are neither complete nor accurate (see Norman, 1983~1,b ut nonetheless they Jitnction to guide much human behavior.

IMAGE

I

6.

3. COGNITIVE ENGINEERING

47

There really are three different concepts to be considered: two men tal, one physical. First, there is the conceptualization of the system held by designer; second, there is the conceptual model constructed by the user; and third, there is the physical image of the system from which the users develop their conceptual models. Both of the concep tual models are what have been called "mental models," but to separate the several different meanings of that term, I refer to these two aspects by different terms.

I call the conceptual model held by the designer the Design Model, and the conceptual model formed by the user the User's Model. The third concept is the image resulting from the physical structure that has been built (including the documentation and instruc tions): I call that the System Image.

The Design Model is the conceptual model of the system to be built. Ideally, this conceptualization is based on the user's task, requirements, and capabilities. The conceptualization must also consider the user's background, experience, and the powers and limitations of the user's information processing mechanisms, most especially processing resources and short-term memory limits.

The user develops a mental model of the system-the User's Model. Note that the user model is not formed from the Design Model: It results from the way the user interprets the System Image. Thus, in many ways, the primary task of the designer is to construct an appropri ate System Image, realizing that everything the user interacts with helps to form that image: the physical knobs, dials, keyboards, and displays, and the documentation, including instruction manuals, help facilities, text input and output, and error messages. The designer should want the User's Model to be compatible with the underlying conceptual model, the Design Model. And this can only happen through interac tion with the System Image. These comments place a severe burden on the designer. If one hopes for the user to understand a system, to use it properly, and to enjoy using it, then i t is up to the designer to make the System Image explicit, intelligible, consistent. And this goes for everything associated with the system. Remember too that people do not always read documentation, and so the major (perhaps entire) bur den is placed on the image that the system projects4 4 The story is actually more complex. The "user's model" can refer to two distinctive things: the individual user's own personal, idiosyncratic model (which is the meaning I intended); or the generalized "typical user" model that is what the designer develops to help in the formulation of the "Design Model."

I jumped between these two different Finally, there is yet another model to worry about: the meanings in this paragraph.

model that an intelligent program might construct of the person with which i t is interact ing, This too has been called a user model and is discussed by Mark in Chapter 11.

48 DONALD A. NORMAN

There do exist good examples of systems that present a System Image to the user in a clear, consistent fashion, following a carefully chosen conceptual model in such a way that the User's Model matches the Design Model. One example is the spreadsheet programs (starting with VISICALC), systems that match the conceptualizations of the tar geted user, the accountant or budget planner. Another good example is the stack calculator, especially the early designs from Hewlett Packard. And a third example is the "office desk' metaphor followed in the Xerox Star, Apple Lisa and Macintosh workstations.

It is easier to design consistent Design Modelsf or some things thanf or others. In general, the more specialized the tool, the higher the level at which a system operates, the easier the task. Spreadsheets are relatively straightforward. General purpose operating systems or programming languages are not. Whenever there is one single task and one set o fusers, the task of developing the conceptual model is much simp18ed. When the system is general purpose, with a relatively unlim ited set of users and power, then the task becomes complex, perhaps undoable. In this case, it may be necessary to have conceptualizations that depend on the use to which the system is being put.

This discussion is meant to introduce the importance and the diffi culties of conceptual models.5 Further discussion of these issues occurs throughout this book, but most especially in the chapters by diSessa (Chapter lo), Mark (Chapter l l ) , Owen (Chapter 91, and Riley (Chapter 7).

ON THE QUALITY OF HUMAN-COMPUTER INTERACTION

The theme of quality of the interaction and "conviviality" of the inter face is important, a theme worth speaking of with force. So for the moment, let me move from a discussion of theories of action and

5 There has been a lot said, but little accomplished, on the nature and importance of mental models in the use of complex systems. The book, Mental Models, edited by Gentner and Stevens (1983) is perhaps the first attempt to spell out some of the issues. And Johnson-Laird's book (19831, with the same title, gets at one possible theoretical understanding of the mental models that people create and use in everyday life. At the time this is being written, the best publication on the role of a mental model in learning and using a complex system is the paper by Kieras and Bovair (1984).

3. COGNITIVE ENGINEERING

49

conceptual models and speak of the qualitative nature of human computer interaction. The details of the interaction matter, ease of use matters, but I want more than correct details, more than a system that is easy to learn or to use: I want a system that is enjoyable to use. This is an important, dominating design philosophy, easier to say than to do. It implies developing systems that provide a strong sense of understanding and control. This means tools that reveal their under lying conceptual model and allow for interaction, tools that emphasize comfort, ease, and pleasure of use: for what Illich (1973) has called A major factor in this debate is the feeling of control

convivial fools.

that the user has over the operations that are being performed. A "powerful," "intelligent" system can lead to the well documented prob lems of "overautomation," causing the user to be a passive observer of operations, no longer in control of either what operations take place, or of how they are done. On the other hand, systems that are not suffi ciently powerful or intelligent can leave too large a gap in the mappings from intention to action execution and from system state to psychologi cal interpretation. The result is that operation and interpretation are complex and difficult, and the user again feels out of control, distanced from the system. Laurel approaches this issue of control over one's activities from the perspective of drama in her chapter, Interface as Mimesis (Chapter 4). To Laurel, the critical aspect is "pleasurable engagement," by which she means the complete and full engagement of the person in pursuit of the "end cause" of the activity. The computer should be invisible to the user, acting as the means by which the person enters into the engage ment, but avoiding intrusion into the ongoing thoughts and activities.

The Power of Tools

When I look around at instances of good system design-systems that I think have had profound influence upon the users, I find that what seems more important than anything else is that they are viewed as tools. That is, the system is deemed useful because it offers powerful tools that the user is able to apply constructively and creatively, with understanding. Here is a partial list of system innovations that follow these principles:

0 Smalltalk.

This language-and more importantly, the design philosophy used in getting there-emphasize the development of tools at an appropriate conceptual level, with object-oriented, message-passing software, where new instances or procedures

50 DONALD A. NORMAN

are derived from old instances, with derived (inherited) condi tions and values, and with the operations visible as graphic objects, if you so want them to be (Goldberg, 1984; Tesler, 1981).

0 The Xerox Star computer. A carefully done, psychologically motivated approach to the user interface, emphasizing a con sistent, well-thought-through user model (Smith, Irby, Kimball, Verplank, & Harslem, 1982). The implementation has changed how we think of interfaces. The Star was heavily influenced by Smalltalk and i t , in turn, led to the Apple Lisa and Macintosh.

0 UNIX. The underlying philosophy is to provide a number of small, carefully crafted operations that can be combined in a flexible manner under the control of the user to do the task at hand. It is something like a construction set of computational procedures. The mechanisms that make this possible are a con sistent data structure and the ability to concatenate programs (via "pipes" and input-output redirection). The interface suffers multiple flaws and is easily made the subject of much ridicule. But the interface has good ideas: aliases, shell scripts, pipes, terminal independence, and an emphasis on shared files and learning by browsing. Elsewhere I have scolded it for its shortcomings (Compton, 1984; Norman, 1981b), but we should not overlook its strengths.

Interlisp (and the Lisp machines).

Providing a powerful environ

ment for Lisp program development, integrating editor, debugger, compiler, and interpreter, nowadays coupled with graphics and windows. To say nothing of DWIM - Do What I Mean (See Teitelman & Masinter, 19811.

0 Spreadsheets.

Merging the computational power of the com

puter with a clean, useful conceptual model, allowing the inter face to drive the entire system, providing just the right tools for a surprising variety of applications.

Steamer. A teaching system based on the concept of intelligent graphics that make visible to the student the operations of an otherwise abstract and complex steam generator system for (Hollan, Hutchins, & Weizman, 1984). large ships.

3. COGNITIVE ENGINEERING

5 1

Bill Budge’s Pinball Construction Set (Budge, 1983). A game, but one that illustrates the toolkit notion of interface, for the user can manipulate the structures at will to create the game of choice. It is easy to learn, easy to use, yet powerful. There is no such thing as an illegal operation, there are no error messages-and no need for any. Errors are simply situations where the operation is not what is desired. No new concepts are in this game over those illustrated by the other items on this list, but the other examples require powerful computers, whereas this works on home machines such as the Apple 11, thus bringing the concept to the home.

This list is idiosyncratic. It leaves out some important examples in favor of ones of lesser importance.

Nonetheless, these are the items that have affected me the most. The major thing all these systems offer is a set of powerful tools to the user.

The Problem With Tools

The Pinball Construction Set illustrates some of the conflicts that tools present, especially conflict over how much intelligence should be present. Much as I enjoy manipulating the parts of the pinball sets, much as my 4-year-old son could learn to work i t with almost no train ing or bother, neither of us are any good at constructing pinball sets. I can’t quite get the parts in the right places: When I stretch a part to change its shape, I usually end up with an unworkable part. Balls get stuck in weird corners. The action is either too fast or too slow. Yes, it is easy to change each problem as it is discovered, but the number seems endless. I wish the tools were more intelligent-do as I am intending, not as I am doing.

(This point is examined in more detail in Chapter 5 by Hutchins, Hollan, and Norman.)

Simple tools have problems because they can require too much skill from the user.

Intelligent tools can have problems if they fail to give any indication of how they operate and of what they are doing. The user can feel like a bystander, watching while unexplained operations

The result is a feeling of lack of control over events. This take place.

is a serious problem, one that is well known to students of social psychology. It is a problem whether i t occurs to the individual while interacting with colleagues, while a passenger in a runaway vehicle, or while using a computer. If we take the notion of “conviviality” seri ously, we will develop tools that make visible their operations and assumptions. The argument really comes down to presenting an appropriate system image to the user, to assist the user’s understanding

52 DONALD A. NORMAN

They require, among other things, developing a good model of the user. In addition, the user must have a good user's model of the system.

of what is going on: cussed in Mark's chapter (Chapter 1 1 ) .

to keep the user in control. These are topics dis

When systems take too much control ofthe environment, they can cause serious social problems. Many observers have commented on the dehumanizing results of automation in the workplace. In part, this automatically results from the sys tems that take control away from the users. As Ehn and Kyng (1 984) put it, such a result follows naturally when the o@ce or workplace is thought of as a system, so that the com puter reduces "the jobs of the workers to algorithmic pro cedures' minimizing the need for skill or control, and thereby the attractiveness ofthe workplace. The alternative view, that

of tools, offers more Kyng, too1s"are under complete and continuous manual con trol of the worker, are fashioned for the use of the skilled worker to create products of good use quality, and are exten sions of the accumulated knowledge of tools and materials of a given labour process." The problem arises over and over again as various workplaces become automated, whether it is the ofice, or the aviation cockpit. I believe the the factory, dlflculties arise from the tension between the natural desire to want intelligent systems that can compensate for our inade quacies and the desire tof eel in control of the outcome. Pro ponents of automatic systems do not wish to make the work place less pleasant.

control to the worker. For Eng and

On the contrary, they wish to improve it. And proponents of tools oJen wish for the power of the (See Chapters 2, 19, and 21 by Bannon automated systems.

forf urther discussion of these issues.)

The Gulfs of Execution and Evaluation, Revisited

The stages of action play important roles in the analysis of the inter face, for they define the psychological stages that need support from the interface.

Moreover, the quality of the interaction probably depends heavily upon the "directness" of the relationship between the psychological and physical variables: just how the Gulfs of Figure 3.1 are bridged.

The theory suggests that two of the mappings of Table 3.1 play critical roles:

(a) the mapping from the psychological variables in which the goals are stated to the physical variables upon which the

3. COGNITIVE ENGINEERING

53

control is actually exerted; (b) the mapping from the physical variables of the system to psychological variables. The easier and more direct these two mappings, the easier and more pleasant the learning and use of the interface, at least so goes the theory.'j In many ways, the design efforts must focus upon the mappings much more than the stages. This issue forms the focus of much of the discussion in the chapter by Hutchins, Hollan, and Norman (Chapter 51, where it is the mappings that are discussed explicitly as helping bridge the gulf between the demands of the machine and the thought processes and actions of the user. In that chapter the discussion soon turns to the qualitative feeling of control that can develop when one perceives that manipulation is directly operating upon the objects of concern to the user: The actions and the results occur instantaneously upon the same object. That chapter provides a start toward a more formal analysis of these qualita tive feelings of "conviviality" or what Hutchins, Hollan, and Norman call "direct engagement" with the task.

The problem of level.

When I program a computer, I want a language that matches my level of thought or action. A pro gramming language is precisely in the spirit of a tool: It is a set of operations and construction procedures that allows a machine to do anything doable, unrestricted by conventions or The power of computers comes about in preconceived notions. part because their languages do follow the tool formulation. But not everyone should do this kind of programming. Most people need higher-level tools, tools where the components are On the other hand, tools

Tools that are too prim itive, no matter how much their power, are dlflcult to work with. The primitive commands of a Turing machine are of sufficient power to do any task doable on a computer, but who would ever want to program any real task with them? This is the " Turing tarpii' discussed in Chapter 5 by Hutchins, Hol Ian, and Norman.

A major issue in the development of tools is to determine the proper level.

already closely matched to the task. that are at too high a level are too specialized. An apple peeler is well matched to its purpose, but it has a restricted set Spelling checkers are powerful tools, but of little aid

6 Streitz (1985) has expressed a similar view, stating that "An interactive computer sys tem (ICs) is the more user-oriented the less discrepancies do exist between the relevant knowledge representations on the user's side and on the side of the ICs."

54 DONALD A. NORMAN

they match the level and intentions of the user, frustrating when they do not.

How do we determine the proper level of a tool? topic that needs more study.

That is a

There are strong and legitimate

arguments against systems that are too specialized. Equally, there are strong arguments against tools that are too primi tive, that operate at too low a level. We want higher-level tools that are crafred to the task. We need lower-level tools in order to create and modlfi higher-level ones. The level of the tool has to match the level of the intention. Again, easier to say than to do.

DESIGN ISSUES

Designing computer systems for people is especially difficult for a number of reasons.

First, the number of variables and potential actions is large, possibly in the thousands.

Second, the technology available today is limited:

limited in the nature of what kinds of input mechan isms exist; limited in the form and variety of output; limited in the amount of affordable memory and computational power. This means that the various mappings (see Table 3.1) are particularly arbitrary. On the other hand, the computer has the potential to make visible much more of the operation of the system and, more importantly, to translate the system’s operations into psychologically meaningful variables and displays than any other machine.

But, as the opening sections of this chapter attempted to demonstrate, the problem is intrinsically difficult: It isn’t just computers that are difficult to use, interaction with any complex device is difficult.

Any real system is the result of a series of tradeoffs that balance one design decision against another, that take into account time, effort, and expense. Almost always the benefits of a design decision along one dimension lead to deficits along some other dimension. The designer must consider the wide class of users, the physical limitations, the con straints caused by time and economics, and the limitations of the tech nology. Moreover, the science and engineering disciplines necessary for a proper design of the interface do not yet exist. So what is the designer to do? What do those of us who are developing the design principles need to do? In this section I review some of the issues, starting with a discussion of the need for approximate theory, moving to a discussion of the general nature of tradeoffs, and then to an exhor tation to attend first to the first-order issues. In all of this, the goal is a

t :

3. COGNITIVE ENGINEERING

55

User-Centered Interface, which means providing intelligent, under standable, tools that bridge the gap between people and systems: con vivial tools.

What Is It We Want in Computer Design?

Approximate science.

In part we need a combined science and

engineering discipline that guides the design, construction, and use of systems. An important point to realize is that approximate methods suf Jice, at least for most applications. This is true of most applied discip lines, from the linear model of transistor circuits to the stress analysis of bridges and buildings: The engineering models are only approxima tions to reality, but the answers are precise enough for the purpose. Note, of course, that the designer must know both the approximate model and its limits.

Consider an example from Psychology: the nature of short-term memory (STM). Even though there is still not an agreed upon theory of memory, and even though the exact nature of STM is still in doubt, quite a bit is known about the phenomena of STM. The following approximation captures a large portion of the phenomena of STM and is, therefore, a valuable tool for many purposes:

The five-slot approximate model of STM. Short-term

memory consists of 5 slots, each capable of holding one item (which might be a pointer to a complex memory structure). Each item decays with a hal$l$e

Most infor mation is lost from STM as a result of interference, new information that takes up the available slots.

of 1.5 seconds.

Although the approximate model is clearly wrong in all its details, in most practical applications the details of STM do not matter: This approximate model can be very valuable. Other approximate models are easy to find. The time to find something can be approximated by assuming that one object can be examined within the fovea at any one time, and that saccades take place at approximately 5 per second. Reac tion and decision times can be approximated by cycles of 100 milli seconds. The book by Card, Moran, and Newell (1983) provides sophisticated examples of the power of approximate models of human cognition. All these models can be criticized at the theoretical level. But they all provide numerical assessment of behavior that will be accu rate enough for almost all applications.

56 DONALD A. NORMAN

Tradeoffs

Assistance for one stage is apt to inter fere with another. Any single design technique is apt to have its vir tues along one dimension compensated by deficiencies along another. Each technique provides a set of tradeoffs. The lesson applies to almost any aspect of design.

Add extra help for the unskilled user and you run the risk of frustrating the experienced user. Make the display screen larger and some tasks get better, but others get more confused. Display more information, and the time to paint the display goes up, the memory requirement goes up, programs become larger, bulkier, slower. It is well known that different tasks and classes of users have different needs and requirements.

The design choices depend on the technology being used, the class of users, and the goals of the design. The designers must decide which aspects of the interface should gain, which can be left wanting. This focus on the tradeoffs emphasizes that the design problem must be looked at as a whole, not in isolated pieces, for the optimal choice for one part of the problem will probably not be optimal for another. According to this view, there are no correct answers, only tradeoffs among alternatives.

Design is a series of tradeoffs:

It might be useful to point out that although there may not be any best solution to a problem in which the needs of dferent parts conjict, there is a worst solution. And even f no design is "best' along all dimensions, some designs are clearly better than others-along all dimensions. It clearly is possible to design a bad system. Equally, it is possible to avoid bad design.

The prototypical tradeofl

information versus time. One basic

tradeoff pervades many design issues: Factors that increase informative ness tend to decrease the amount of available workspace and system respon siveness. On the one hand, the more informative and complete the display, the more useful when the user has doubts or lacks understand ing, On the other hand, the more complete the display, the longer it takes to be displayed and the more space it must occupy physically. This tradeoff of amount of information versus space and time appears in many guises and is one of the major interface issues that must be handled (Norman, 1983a). To appreciate its importance, one has only to examine a few recent commercial offerings, highly touted for their innovative (and impressive) human factors design that were intended

3. COGNITIVE ENGINEERING

57

to make the system easy and pleasurable to use, but which so degraded system response time that serious user complaints resulted. The term "user friendly" has taken on a negative meaning as a result of badly engineered tradeoffs, sacrificing utility, efficiency, and ease of use for the benefit of some hypothetical, ill-informed, first-time user.

It is often stated that current computer systems do not provide beginning users with sufficient information. However, the long, infor mative displays or sequence of questions, options, or menus that may ' make a system usable by the beginner are disruptive to the experienced user who knows exactly what action is to be specified and wishes to minimize the time and mental effort required to do the specification. The tradeoff here is not only between different needs, but between dif ferent stages of activity. After all, the extra information required by the beginner would not bother the experienced users if they could ignore it. apt to take excess time to be displayed or to use up valuable space on the display, in either case impeding the experienced users in executing and evaluating their actions. We pit the experienced user's require ment for ease of specification against the beginner's requirement for knowledge.

However, this information usually cannot be ignored.

It is

First- and second-order issues.

One major tradeoff concerns just which aspects of the system will be worked on. With limited time and people, the design team has to make decisions: Some parts of the sys tem will receive careful attention, others will not. Each different aspect of the design takes time, energy, and resources, none of which is apt to be readily available. Therefore, it is important to be able to distinguish the first order effects from the secondary effects-the big issues from the little issues.

I argue that it is the conceptual models that are of primary impor tance: the design model, the system image, the user's model. If you don't have the right design model, then all else fades to insignificance. Get the major issue right first-the Design Model and the System Image. Then, and only then, worry about the second order issues.

Example: VZSZCALC. At the time VISICALC was introduced, i t represented a significant breakthrough in design. Bookkeepers and accountants were often wary of computers, especially those who were involved in small and medium size enterprises where they had to work alone, without the assistance of corps of programmers and computer

It let the users work on their specialists. VISICALC changed all this.

own terms, putting together a "spreadsheet" of figures, readily changing the numbers and watching the implications appear in the relevant spots.

58 DONALD A. NORMAN

It would be useful to explore the various design issues involved in the construction of VISICALC. The designers not only were faced with the creation of a conceptualization unlike anything else that existed, but they chose to do it on a relatively small and limited machine, one in which the two major languages available were BASIC and Assembler code, which could only display 24 rows of 40 columns worth of upper case letters and digits. Yet, spreadsheets require matrices with hun dreds of rows and columns of numerals. The success of VISICALC was due both to the power of the original conceptualization and the clever use of design techniques to overcome the limitations of the machine. Probably an important key to its sucess was that the design team consisted of just two people, one a user (at the time, he was a student in the Harvard Business School who needed a tool to do busi ness analyses and projections), the other a programmer.

But look at the command structure used in VISICALC: cryptic, obscure, and unmeaningful. It is easy to make errors, difficult to remember the appropriate operations. The choice of command names could be used as an exercise in how not to do things, for they appear to be the typical conventions chosen by computer programmers, for com

The point of this is to note that VISICALC was a puter programmers.

success story, despite the poor choice of command structure. Yes, VISICALC would have been much improved had the commands been better. People would have liked i t better, users would have been hap pier. But the commands were a second-order issue. The designers of VISICALC were working with limited time, manpower, and budget: They were wise in concentrating on the important conceptualizations and letting the problems of command names go for later. I certainly do not wish to advocate the use of poor commands, but the names are second-order issues.

Why was the command structure less important than the overall conceptual structure? Two factors helped:

0 The system was self-contained. 0 The typical user was a frequent user.

First, VISICALC was a self-contained system. That is, many users of VISICALC, especially the first wave of users, used only VISICALC. They put the floppy disk containing VISICALC into the computer, turned it on, did their work, and then turned off the computer. There fore, there were no conflicts between the command choices used by VISICALC and other programs. This eliminated one major source of difficulty. Second, most users of VISICALC were practiced, experi enced users of the system. The prime audience of the system was the professional who worked with spreadsheet computations on a regular

3. COGNITIVE ENGINEERING

59

basis. Therefore, the commands would be expected to be used fre quently. And whenever there is much experience and practice, lack of meaning and consistency is not so important. Yes, the learning time might be long, but it only need take place once and then, once the commands have been learned well, they become automatic, causing no further difficulty.

Choices of command names are especially critical when many different systems are to be used, each with its own cryptic, idiosyncratic choice of names.

Problems arise when different systems are involved, oftentimes with similar functions that have different names and conventions, and with similar names that have different meanings. When a system is heavily used by beginners or casual users, then command names take on added significance.

Prescriptions for Design Principles

What is it that we need to do? What should we accomplish? What is the function of Cognitive Engineering? The list of things is long, for here we speak of creating an entirely new discipline, one moreover that combines two already complex fields: psychology and computer sci ence. Moreover, it requires breaking new ground, for our knowledge of what fosters good interactions among people and between people and devices is young, without a well-developed foundation. We are going to need a good, solid technical grounding in the principles of human processing. In addition, we need to understand the more global issues that determine the essence of interaction.

We need to understand the

As Chapter 15 by Buxton way that hardware affects the interaction:

points out, even subtle changes in hardware can make large changes in the usability of a system. And we need to explore the technology into far richer and more expressive domains than has so far been done.

On the one hand, we do need to go deeper into the details of the design. On the other hand, we need to determine some of the higher, overriding principles. The analysis of the stages of interaction moves us in the former direction, into the details of interaction. In this chapter I have raised a number of the issues relevant to the second issue: the higher, more global concerns of human-machine interaction. The general ideas and the global framework lead to a set of overriding design guidelines, not for guiding specific details of the design, but for structuring how the design process might proceed. Here are some prescriptions for design:

Create a science of user-centered design.

For this, we need prin

ciples that can be applied at the time of the design, principles that get the design to a pretty good state the first time around.

60 DONALD A. NORMAN

This requires sufficient design principles and simulation tools for establishing the design of an interface before constructing it. There will still have to be continual iterations, testing, and refinement of the interface-all areas of design need that-but the first pass ought to be close.

0 Take interface design seriously as an independent and important

problem. It takes at least three kinds of special knowledge to

design an interface: ming and of the technology; second, knowledge of people, of the principles of mental computation, of communication, and of interaction; and third, expert knowledge of the task that is to be accomplished. Most programmers and designers of com puter systems have the first kind of knowledge, but not the second or third. Most psychologists have the second, but not the first or third. And the potential user is apt to have the third, but not the first or second. As a result, if a computer system is to be constructed with a truly user-centered design, it will have to be done in collaboration with people trained in all these areas. We need either especially trained interface special ists or teams of designers, some members expert in the topic domain of the device, some expert in the mechanics of the device, and some expert about people. (This procedure is already in use by a number of companies: often those with the best interfaces, I might add.)

first, knowledge of design, of program

0 Separate the design of the interfacef rom the design of the system.

This is the principle of modularization in design. It allows the previous point to work. Today, in most systems, everyone has access to control of the screen or mouse. This means that even the deepest, darkest, most technical systems programmer can send a message to the user when trouble arises: Hence arises my favorite mystical error message: "longimp botch, core dump'' or du Boulay's favorite compiler error message: "Fatal error in pass zero" (Draper & Norman, 1984; du Boulay & Matthew, 1984). It is only the interface module that should be in communication with the user, for it is only this module that can know which messages to give, which to defer, to know where on the screen messages should go without interfering with the main task, or to know the associated information that should be provided. Messages are interruptions (and some times reminders), in the sense described in the chapters by Cypher (Chapter 12) and Miyata and Norman (Chapter 13).

3. COGNITIVE ENGINEERING

6 1

Because they affect the ongoing task, they have to be presented at the right time, at the right level of specification.

Modularity also allows for change: The system can change without affecting the interface; the interface can change without affecting the system. Different users may need different inter faces, even for the same task and the same system. Evalua tions of the usability of the interface may lead to changes-the principle of iterative, interactive design-and this should be possible without disruption to the rest of the system. This is not possible if user interaction is scattered throughout the sys tem: It is possible if the interface is a separate, independent module.

Do user-centered system design: Start with the needs o f the user.

From the point of view of the user, the interface is the system. Concern for the nature of the interaction and for the user

these are the things that should force the design. Let the requirements for the interaction drive the design of the inter face, let ideas about the interface drive the technology. The final design is a collaborative effort among many different dis ciplines, trading off the virtues and deficits of many different design approaches. But user-centered design emphasizes that the purpose of the system is to serve the user, not to use a specific technology, not to be an elegant piece of programming. The needs of the users should dominate the design of the inter face, and the needs of the interface should dominate the design of the rest of the system.

ACKNOWLEDGMENTS

The chapter has been much aided by the comments of numerous peo ple. I thank Eileen Conway for her aid with the illustrations. Julie Norman and Sondra Buffett provided extensive editorial comments for each of the numerous revisions. Liam Bannon, Steve Draper, and Dave Owen provided a number of useful comments and suggestions. Jonathan Grudin was most savage of the lot, and therefore the most helpful. And the Asilomar Workshop group provided a thorough read ing, followed by two hours of intensive commentary. All this effort on the part of the critics led to major revision and reorganization. For all this assistance, I am grateful.

THE DESIGN OF EVERYDAY THINGS

THE PSYCHOLOGY OF EVERYDAY ACTIONS

During my family’s stay in England, we rented a furnished house while the owners were away. One day, our landlady returned to the house to get some personal papers. She walked over to the old, metal filing cabinet and attempted to open the top drawer. It wouldn’t open. She pushed it forward and backward, right and left, up and down, without success. I offered to help. I wiggled the drawer. Then I twisted the front panel, pushed down hard, and banged the front with the palm of one hand. The cabinet drawer slid open. “Oh,” she said, “I’m sorry. I am so bad at mechanical things.” No, she had it backward. It is the mechanical thing that should be apologizing, perhaps saying, “I’m sorry. I am so bad with people.”

My landlady had two problems. First, although she had a clear goal (retrieve some personal papers) and even a plan for achieving that goal (open the top drawer of the filing cabinet, where those papers are kept), once that plan failed, she had no idea of what to do. But she also had a second problem: she thought the problem lay in her own lack of ability: she blamed herself, falsely.

How was I able to help? First, I refused to accept the false accu sation that it was the fault of the landlady: to me, it was clearly a fault in the mechanics of the old filing cabinet that prevented the drawer from opening. Second, I had a conceptual model of how the cabinet worked, with an internal mechanism that held the door shut in normal usage, and the belief that the drawer mechanism was probably out of alignment. This conceptual model gave me a plan: wiggle the drawer. That failed. That caused me to modify my plan: wiggling may have been appropriate but not forceful enough, so I resorted to brute force to try to twist the cabinet back into its proper alignment. This felt good to me—the cabinet drawer moved slightly—but it still didn’t open. So I resorted to the most powerful tool employed by experts the world around—I banged on the cabinet. And yes, it opened. In my mind, I decided (without any evidence) that my hit had jarred the mechanism sufficiently to allow the drawer to open.

This example highlights the themes of this chapter. First, how do people do things? It is easy to learn a few basic steps to perform operations with our technologies (and yes, even filing cabinets are technology). But what happens when things go wrong? How do we detect that they aren’t working, and then how do we know what to do? To help understand this, I first delve into human psy chology and a simple conceptual model of how people select and then evaluate their actions. This leads the discussion to the role of understanding (via a conceptual model) and of emotions: pleasure when things work smoothly and frustration when our plans are thwarted. Finally, I conclude with a summary of how the lessons of this chapter translate into principles of design.

How People Do Things: The Gulfs of Execution and Evaluation

When people use something, they face two gulfs: the Gulf of Exe cution, where they try to figure out how it operates, and the Gulf of Evaluation, where they try to figure out what happened (Fig ure 2.1). The role of the designer is to help people bridge the two gulfs.

In the case of the filing cabinet, there were visible elements that helped bridge the Gulf of Execution when everything was work ing perfectly. The drawer handle clearly signified that it should be pulled and the slider on the handle indicated how to release the catch that normally held the drawer in place. But when these oper ations failed, there then loomed a big gulf: what other operations could be done to open the drawer?

The Gulf of Evaluation was easily bridged, at first. That is, the catch was re leased, the drawer handle pulled, yet nothing hap pened. The lack of action signified a failure to reach the goal. But when other operations were tried, such as my twisting and pull ing, the filing cabinet pro vided no more information about whether I was get ting closer to the goal. The Gulf of Evaluation reflects the amount of ef fort that the person must make to interpret the phys ical state of the device and to determine how well the expectations and intentions have been met. The gulf is small when the device provides information about its state in a form that is easy to get, is easy to interpret, and matches the way the person thinks about the system. What are the major design elements that help bridge the Gulf of Evaluation? Feedback and a good conceptual model.

The gulfs are present for many devices. Interestingly, many peo ple do experience difficulties, but explain them away by blaming themselves. In the case of things they believe they should be capa ble of using—water faucets, refrigerator temperature controls, stove tops—they simply think, “I’m being stupid.” Alternatively, for com plicated-looking devices—sewing machines, washing machines, digital watches, or almost any digital controls—they simply give up, deciding that they are incapable of understanding them. Both expla nations are wrong. These are the things of everyday household use. None of them has a complex underlying structure. The difficulties reside in their design, not in the people attempting to use them.

The Gulfs of Execution and Eval

uation. When people encounter a device, they face two gulfs: the Gulf of Execution, where they try to figure out how to use it, and the Gulf of Evaluation, where they try to figure out what state it is in and whether their actions got them to their goal.

How can the designer help bridge the two gulfs? To answer that question, we need to delve more deeply into the psychology of human action. But the basic tools have already been discussed: We bridge the Gulf of Execution through the use of signifiers, con straints, mappings, and a conceptual model. We bridge the Gulf of Evaluation through the use of feedback and a conceptual model.

The Seven Stages of Action

There are two parts to an action: executing the action and then evaluating the results: doing and interpreting. Both execution and evaluation require understanding: how the item works and what results it produces. Both execution and evaluation can affect our emotional state.

Suppose I am sitting in my armchair, reading a book. It is dusk, and the light is getting dimmer and dimmer. My current activity is reading, but that goal is starting to fail because of the decreasing illumination. This realization triggers a new goal: get more light. How do I do that? I have many choices. I could open the curtains, move so that I sit where there is more light, or perhaps turn on a nearby light. This is the planning stage, determining which of the many possible plans of action to follow. But even when I decide to turn on the nearby light, I still have to determine how to get it done. I could ask someone to do it for me, I could use my left hand or my right. Even after I have decided upon a plan, I still have to specify how I will do it. Finally, I must execute—do—the action. When I am doing a frequent act, one for which I am quite experi enced and skilled, most of these stages are subconscious. When I am still learning how to do it, determining the plan, specifying the sequence, and interpreting the result are conscious.

Suppose I am driving in my car and my action plan requires me to make a left turn at a street intersection. If I am a skilled driver, I don’t have to give much conscious attention to specify or per form the action sequence. I think “left” and smoothly execute the required action sequence. But if I am just learning to drive, I have to think about each separate component of the action. I must ap ply the brakes and check for cars behind and around me, cars and pedestrians in front of me, and whether there are traf fic signs or signals that I have to obey. I must move my feet back and forth be tween pedals and my hands to the turn signals and back to the steering wheel (while I try to remember just how my instructor told me I should position my hands while making a turn), and my visual attention is di vided among all the activ ity around me, sometimes looking directly, some times rotating my head, and sometimes using the rear- and side-view mirrors. To the skilled driver, it is all easy and straightforward. To the beginning driver, the task seems impossible.

The specific actions bridge the gap between what we would like to have done (our goals) and all possible physical actions to achieve those goals. After we specify what actions to make, we must actually do them—the stages of execution. There are three stages of execution that follow from the goal: plan, specify, and perform (the left side of Figure 2.2). Evaluating what happened has three stages: first, perceiving what happened in the world; second, trying to make sense of it (interpreting it); and, finally, comparing what happened with what was wanted (the right side of Figure 2.2). There we have it. Seven stages of action: one for goals, three for execution, and three for evaluation (Figure 2.2).

The Seven Stages of the Action

Cycle. Putting all the stages together yields the three stages of execution (plan, specify, and per form), three stages of evaluation (perceive, in terpret, and compare), and, of course, the goal: seven stages in all.

  1. Goal (form the goal)
  2. Plan (the action)
  3. Specify (an action sequence)
  4. Perform (the action sequence)
  1. Perceive (the state of the world)
  2. Interpret (the perception)
  3. Compare (the outcome with the goal)

The seven-stage action cycle is simplified, but it provides a use ful framework for understanding human action and for guiding design. It has proven to be helpful in designing interaction. Not all of the activity in the stages is conscious. Goals tend to be, but even they may be subconscious. We can do many actions, repeatedly cycling through the stages while being blissfully unaware that we are doing so. It is only when we come across something new or reach some impasse, some problem that disrupts the normal flow of activity, that conscious attention is required. Most behavior does not require going through all stages in se quence; however, most activities will not be satisfied by single ac tions. There must be numerous sequences, and the whole activity may last hours or even days. There are multiple feedback loops in which the results of one activity are used to direct further ones, in which goals lead to subgoals, and plans lead to subplans. There are activities in which goals are forgotten, discarded, or reformulated. Let’s go back to my act of turning on the light. This is a case of event-driven behavior: the sequence starts with the world, caus ing evaluation of the state and the formulation of a goal. The trig ger was an environmental event: the lack of light, which made reading difficult. This led to a violation of the goal of reading, so it led to a subgoal—get more light. But reading was not the high level goal. For each goal, one has to ask, “Why is that the goal?” Why was I reading? I was trying to prepare a meal using a new recipe, so I needed to reread it before I started. Reading was thus a subgoal. But cooking was itself a subgoal. I was cooking in or der to eat, which had the goal of satisfying my hunger. So the hierarchy of goals is roughly: satisfy hunger; eat; cook; read cook book; get more light. This is called a root cause analysis: asking “Why?” until the ultimate, fundamental cause of the activity is reached.

The action cycle can start from the top, by establishing a new goal, in which case we call it goal-driven behavior. In this situ ation, the cycle starts with the goal and then goes through the three stages of execution. But the action cycle can also start from the bottom, triggered by some event in the world, in which case we call it either data-driven or event-driven behavior. In this situation, the cycle starts with the environment, the world, and then goes through the three stages of evaluation.

For many everyday tasks, goals and intentions are not well spec ified: they are opportunistic rather than planned. Opportunistic actions are those in which the behavior takes advantage of circum stances. Rather than engage in extensive planning and analysis, we go about the day’s activities and do things as opportunities arise. Thus, we may not have planned to try a new café or to ask a question of a friend. Rather, we go through the day’s activities, and if we find ourselves near the café or encountering the friend, then we allow the opportunity to trigger the appropriate activity. Otherwise, we might never get to that café or ask our friend the question. For crucial tasks we make special efforts to ensure that they get done. Oppor tunistic actions are less precise and certain than specified goals and intentions, but they result in less mental effort, less inconvenience, and perhaps more interest. Some of us adjust our lives around the expectation of opportunities. And sometimes, even for goal-driven behavior, we try to create world events that will ensure that the sequence gets completed. For example, sometimes when I must do an important task, I ask someone to set a deadline for me. I use the approach of that deadline to trigger the work. It may only be a few hours before the deadline that I actually get to work and do the job, but the important point is that it does get done. This self-triggering of external drivers is fully compatible with the seven-stage analysis.

The seven stages provide a guideline for developing new prod ucts or services. The gulfs are obvious places to start, for either gulf, whether of execution or evaluation, is an opportunity for product enhancement. The trick is to develop observational skills to detect them. Most innovation is done as an incremental enhancement of existing products. What about radical ideas, ones that introduce new product categories to the marketplace? These come about by reconsidering the goals, and always asking what the real goal is: what is called the root cause analysis.

Harvard Business School marketing professor Theodore Levitt once pointed out, “People don’t want to buy a quarter-inch drill.

They want a quarter-inch hole!” Levitt’s example of the drill im plying that the goal is really a hole is only partially correct, how ever. When people go to a store to buy a drill, that is not their real goal. But why would anyone want a quarter-inch hole? Clearly that is an intermediate goal. Perhaps they wanted to hang shelves on the wall. Levitt stopped too soon.

Once you realize that they don’t really want the drill, you realize that perhaps they don’t really want the hole, either: they want to install their bookshelves. Why not develop methods that don’t re quire holes? Or perhaps books that don’t require bookshelves. (Yes, I know: electronic books, e-books.)

Human Thought: Mostly Subconscious

Why do we need to know about the human mind? Because things are designed to be used by people, and without a deep under standing of people, the designs are apt to be faulty, difficult to use, difficult to understand. That is why it is useful to consider the seven stages of action. The mind is more difficult to comprehend than actions. Most of us start by believing we already understand both human behavior and the human mind. After all, we are all hu man: we have all lived with ourselves all of our lives, and we like to think we understand ourselves. But the truth is, we don’t. Most of human behavior is a result of subconscious processes. We are unaware of them. As a result, many of our beliefs about how peo ple behave—including beliefs about ourselves—are wrong. That is why we have the multiple social and behavioral sciences, with a good dash of mathematics, economics, computer science, informa tion science, and neuroscience.

Consider the following simple experiment. Do all three steps:

  1. Wiggle the second finger of your hand.
  2. Wiggle the third finger of the same hand.
  3. Describe what you did differently those two times.

On the surface, the answer seems simple: I thought about mov ing my fingers and they moved. The difference is that I thought about a different finger each time. Yes, that’s true. But how did that thought get transmitted into action, into the commands that caused different muscles in the arm to control the tendons that wiggled the fingers? This is completely hidden from consciousness. The human mind is immensely complex, having evolved over a long period with many specialized structures. The study of the mind is the subject of multiple disciplines, including the behav ioral and social sciences, cognitive science, neuroscience, philos ophy, and the information and computer sciences. Despite many advances in our understanding, much still remains mysterious, yet to be learned. One of the mysteries concerns the nature of and dis tinction between those activities that are conscious and those that are not. Most of the brain’s operations are subconscious, hidden beneath our awareness. It is only the highest level, what I call re

flective, that is conscious.

Conscious attention is necessary to learn most things, but after the initial learning, continued practice and study, sometimes for thousands of hours over a period of years, produces what psychol ogists call “overlearning,” Once skills have been overlearned, per formance appears to be effortless, done automatically, with little or no awareness. For example, answer these questions:

What is the phone number of a friend? What is Beethoven’s phone number? What is the capital of:

Think about how you answered these questions. The answers you knew come immediately to mind, but with no awareness of how that happened. You simply “know” the answer. Even the ones you got wrong came to mind without any awareness. You might have been aware of some doubt, but not of how the name entered your consciousness. As for the countries for which you didn’t know the answer, you probably knew you didn’t know those im mediately, without effort. Even if you knew you knew, but couldn’t quite recall it, you didn’t know how you knew that, or what was happening as you tried to remember.

You might have had trouble with the phone number of a friend because most of us have turned over to our technology the job of remembering phone numbers. I don’t know anybody’s phone number—I barely remember my own. When I wish to call some one, I just do a quick search in my contact list and have the tele phone place the call. Or I just push the “2” button on the phone for a few seconds, which autodials my home. Or in my auto, I can simply speak: “Call home.” What’s the number? I don’t know: my technology knows. Do we count our technology as an extension of our memory systems? Of our thought processes? Of our mind?

What about Beethoven’s phone number? If I asked my computer, it would take a long time, because it would have to search all the people I know to see whether any one of them was Beethoven. But you immediately discarded the question as nonsensical. You don’t personally know Beethoven. And anyway, he is dead. Be sides, he died in the early 1800s and the phone wasn’t invented until the late 1800s. How do we know what we do not know so rapidly? Yet some things that we do know can take a long time to retrieve. For example, answer this:

In the house you lived in three houses ago, as you entered the front door, was the doorknob on the left or right?

Now you have to engage in conscious, reflective problem solv ing, first to retrieve just which house is being talked about, and then what the correct answer is. Most people can determine the house, but have difficulty answering the question because they can readily imagine the doorknob on both sides of the door. The way to solve this problem is to imagine doing some activity, such as walk ing up to the front door while carrying heavy packages with both hands: how do you open the door? Alternatively, visualize yourself inside the house, rushing to the front door to open it for a visitor.

Usually one of these imagined scenarios provides the answer. But note how different the memory retrieval for this question was from the retrieval for the others. All these questions involved long-term memory, but in very different ways. The earlier questions were memory for factual information, what is called declarative memory. The last question could have been answered factually, but is usu ally most easily answered by recalling the activities performed to open the door. This is called procedural memory. I return to a discus sion of human memory in Chapter 3.

Walking, talking, reading. Riding a bicycle or driving a car. Sing ing. All of these skills take considerable time and practice to mas ter, but once mastered, they are often done quite automatically. For experts, only especially difficult or unexpected situations require conscious attention.

Because we are only aware of the reflective level of conscious processing, we tend to believe that all human thought is con scious. But it isn’t. We also tend to believe that thought can be separated from emotion. This is also false. Cognition and emo tion cannot be separated. Cognitive thoughts lead to emotions: emotions drive cognitive thoughts. The brain is structured to act upon the world, and every action carries with it expectations, and these expectations drive emotions. That is why much of language is based on physical metaphors, why the body and its interaction with the environment are essential components of human thought. Emotion is highly underrated. In fact, the emotional system is a powerful information processing system that works in tandem with cognition. Cognition attempts to make sense of the world: emotion assigns value. It is the emotional system that determines whether a situation is safe or threatening, whether something that is happening is good or bad, desirable or not. Cognition provides understanding: emotion provides value judgments. A human with out a working emotional system has difficulty making choices. A human without a cognitive system is dysfunctional.

Because much human behavior is subconscious—that is, it oc curs without conscious awareness—we often don’t know what we are about to do, say, or think until after we have done it. It’s as if we had two minds: the subconscious and the conscious, which don’t always talk to each other. Not what you have been taught? True, nonetheless. More and more evidence is accumulating that we use logic and reason after the fact, to justify our decisions to ourselves (to our conscious minds) and to others. Bizarre? Yes, but don’t protest: enjoy it.

Subconscious thought matches patterns, finding the best possible match of one’s past experience to the current one. It proceeds rap idly and automatically, without effort. Subconscious processing is one of our strengths. It is good at detecting general trends, at recog nizing the relationship between what we now experience and what has happened in the past. And it is good at generalizing, at making predictions about the general trend, based on few examples. But subconscious thought can find matches that are inappropriate or wrong, and it may not distinguish the common from the rare. Sub conscious thought is biased toward regularity and structure, and it is limited in formal power. It may not be capable of symbolic ma nipulation, of careful reasoning through a sequence of steps. Conscious thought is quite different. It is slow and labored. Here is where we slowly ponder decisions, think through alter natives, compare different choices. Conscious thought considers first this approach, then that—comparing, rationalizing, finding explanations. Formal logic, mathematics, decision theory: these are the tools of conscious thought. Both conscious and subconscious modes of thought are powerful and essential aspects of human life. Both can provide insightful leaps and creative moments. And both are subject to errors, misconceptions, and failures. Emotion interacts with cognition biochemically, bathing the brain with hormones, transmitted either through the bloodstream or through ducts in the brain, modifying the behavior of brain cells. Hormones exert powerful biases on brain operation. Thus, in tense, threatening situations, the emotional system triggers the release of hormones that bias the brain to focus upon relevant parts of the environment. The muscles tense in preparation for action. In calm, nonthreatening situations, the emotional system triggers the release of hormones that relax the muscles and bias the brain toward explo

Subconscious and Conscious Systems of Cognition TABLE 2.1.

Subconscious

Conscious

Fast

Slow

Automatic

Controlled

Multiple resources

Limited resources

Controls skilled behavior

Invoked for novel situations: when learning, when in danger, when things go wrong

ration and creativity. Now the brain is more apt to notice changes in the environment, to be distracted by events, and to piece together events and knowledge that might have seemed unrelated earlier. A positive emotional state is ideal for creative thought, but it is not very well suited for getting things done. Too much, and we call the person scatterbrained, flitting from one topic to another, unable to finish one thought before another comes to mind. A brain in a negative emotional state provides focus: precisely what is needed to maintain attention on a task and finish it. Too much, however, and we get tunnel vision, where people are unable to look beyond their narrow point of view. Both the positive, relaxed state and the anxious, negative, and tense state are valuable and powerful tools for human creativity and action. The extremes of both states, how ever, can be dangerous.

Human Cognition and Emotion

The mind and brain are complex entities, still the topic of con siderable scientific research. One valuable explanation of the lev els of processing within the brain, applicable to both cognitive and emotional processing, is to think of three different levels of processing, each quite different from the other, but all working together in concert. Although this is a gross oversimplification of the actual processing, it is a good enough approximation to provide guidance in understanding human behavior. The approach I use here comes from my book Emotional Design. There, I suggested that a useful approximate model of human cognition and emotion is to consider three levels of processing: visceral, behavioral, and reflective.

THE VISCERAL LEVEL

The most basic level of processing is called visceral. This is some times referred to as “the lizard brain.” All people have the same ba sic visceral responses. These are part of the basic protective mech anisms of the human affective system, making quick judgments about the environment: good or bad, safe or dangerous. The visceral system allows us to respond quickly and subconsciously, without

conscious awareness or control. The basic biology of the visceral system minimizes its ability to learn. Visceral learning takes place primarily by sensitization or desensitization through such mechanisms as adaptation and classical conditioning. Visceral responses are fast and automatic. They give rise to the startle reflex for novel, unexpected events; for such genetically programmed behavior as fear of heights, dis like of the dark or very noisy environments, dislike of bitter tastes and the liking of sweet tastes, and so on. Note that the visceral level responds to the immediate present and produces an affective state, relatively unaffected by context or history. It simply assesses the situation: no cause is assigned, no blame, and no credit.

The visceral level is tightly coupled to the body’s musculature— the motor system. This is what causes animals to fight or flee, or to relax. An animal’s (or person’s) visceral state can often be read by analyzing the tension of the body: tense means a negative state; re laxed, a positive state. Note, too, that we often determine our own body state by noting our own musculature. A common self-report

FIGURE . 2

Three Levels of Process ing: Visceral, Behavioral, and Reflective.

Visceral and behavioral levels are subcon scious and the home of basic emotions. The reflective level is where conscious thought and decision-making reside, as well as the highest level of emotions.

might be something like, “I was tense, my fists clenched, and I was sweating.”

Visceral responses are fast and completely subconscious. They are sensitive only to the current state of things. Most scientists do not call these emotions: they are precursors to emotion. Stand at the edge of a cliff and you will experience a visceral response. Or bask in the warm, comforting glow after a pleasant experience, perhaps a nice meal.

For designers, the visceral response is about immediate per ception: the pleasantness of a mellow, harmonious sound or the jarring, irritating scratch of fingernails on a rough surface. Here is where the style matters: appearances, whether sound or sight, touch or smell, drive the visceral response. This has nothing to do with how usable, effective, or understandable the product is. It is all about attraction or repulsion. Great designers use their aesthetic sensibilities to drive these visceral responses.

Engineers and other logical people tend to dismiss the visceral response as irrelevant. Engineers are proud of the inherent qual ity of their work and dismayed when inferior products sell better “just because they look better.” But all of us make these kinds of judgments, even those very logical engineers. That’s why they love some of their tools and dislike others. Visceral responses matter.

THE BEHAVIORAL LEVEL

The behavioral level is the home of learned skills, triggered by sit uations that match the appropriate patterns. Actions and analyses at this level are largely subconscious. Even though we are usually aware of our actions, we are often unaware of the details. When we speak, we often do not know what we are about to say until our conscious mind (the reflective part of the mind) hears ourselves uttering the words. When we play a sport, we are prepared for ac tion, but our responses occur far too quickly for conscious control: it is the behavioral level that takes control.

When we perform a well-learned action, all we have to do is think of the goal and the behavioral level handles all the details: the conscious mind has little or no awareness beyond creating the desire to act. It’s actually interesting to keep trying it. Move the left hand, then the right. Stick out your tongue, or open your mouth. What did you do? You don’t know. All you know is that you “willed” the action and the correct thing happened. You can even make the actions more complex. Pick up a cup, and then with the same hand, pick up several more items. You automatically adjust the fingers and the hand’s orientation to make the task possible. You only need to pay conscious attention if the cup holds some liq uid that you wish to avoid spilling. But even in that case, the actual control of the muscles is beneath conscious perception: concentrate on not spilling and the hands automatically adjust. For designers, the most critical aspect of the behavioral level is that every action is associated with an expectation. Expect a positive outcome and the result is a positive affective response (a “posi tive valence,” in the scientific literature). Expect a negative outcome and the result is a negative affective response (a negative valence): dread and hope, anxiety and anticipation. The information in the feedback loop of evaluation confirms or disconfirms the expecta tions, resulting in satisfaction or relief, disappointment or frustration.

Behavioral states are learned. They give rise to a feeling of con trol when there is good understanding and knowledge of results, and frustration and anger when things do not go as planned, and especially when neither the reason nor the possible remedies are known. Feedback provides reassurance, even when it indicates a negative result. A lack of feedback creates a feeling of lack of con trol, which can be unsettling. Feedback is critical to managing ex pectations, and good design provides this. Feedback—knowledge of results—is how expectations are resolved and is critical to learn ing and the development of skilled behavior.

Expectations play an important role in our emotional lives. This is why drivers tense when trying to get through an intersection be fore the light turns red, or students become highly anxious before an exam. The release of the tension of expectation creates a sense of relief. The emotional system is especially responsive to changes in states—so an upward change is interpreted positively even if it is only from a very bad state to a not-so-bad state, just as a change is interpreted negatively even if it is from an extremely positive state to one only somewhat less positive.

THE REFLECTIVE LEVEL

The reflective level is the home of conscious cognition. As a conse quence, this is where deep understanding develops, where reason ing and conscious decision-making take place. The visceral and behavioral levels are subconscious and, as a result, they respond rapidly, but without much analysis. Reflection is cognitive, deep, and slow. It often occurs after the events have happened. It is a re flection or looking back over them, evaluating the circumstances, actions, and outcomes, often assessing blame or responsibility. The highest levels of emotions come from the reflective level, for it is here that causes are assigned and where predictions of the future take place. Adding causal elements to experienced events leads to such emotional states as guilt and pride (when we assume our selves to be the cause) and blame and praise (when others are thought to be the cause). Most of us have probably experienced the extreme highs and lows of anticipated future events, all imagined by a runaway reflective cognitive system but intense enough to create the physiological responses associated with extreme anger or pleasure. Emotion and cognition are tightly intertwined.

DESIGN MUST TAKE PLACE AT ALL LEVELS: VISCERAL, BEHAVIORAL, AND REFLECTIVE

To the designer, reflection is perhaps the most important of the levels of processing. Reflection is conscious, and the emotions produced at this level are the most protracted: those that assign agency and cause, such as guilt and blame or praise and pride. Re flective responses are part of our memory of events. Memories last far longer than the immediate experience or the period of usage, which are the domains of the visceral and behavioral levels. It is reflection that drives us to recommend a product, to recommend that others use it—or perhaps to avoid it.

Reflective memories are often more important than reality. If we have a strongly positive visceral response but disappointing usability problems at the behavioral level, when we reflect back upon the product, the reflective level might very well weigh the positive response strongly enough to overlook the severe behav ioral difficulties (hence the phrase, “Attractive things work bet ter”). Similarly, too much frustration, especially toward the ending stage of use, and our reflections about the experience might over look the positive visceral qualities. Advertisers hope that the strong reflective value associated with a well-known, highly prestigious brand might overwhelm our judgment, despite a frustrating expe rience in using the product. Vacations are often remembered with fondness, despite the evidence from diaries of repeated discomfort and anguish.

All three levels of processing work together. All play essential roles in determining a person’s like or dislike of a product or ser vice. One nasty experience with a service provider can spoil all future experiences. One superb experience can make up for past deficiencies. The behavioral level, which is the home of interaction, is also the home of all expectation-based emotions, of hope and joy, frustration and anger. Understanding arises at a combination of the behavioral and reflective levels. Enjoyment requires all three. Designing at all three levels is so important that I devote an entire book to the topic, Emotional Design.

In psychology, there has been a long debate about which hap pens first: emotion or cognition. Do we run and flee because some event happened that made us afraid? Or are we afraid because our conscious, reflective mind notices that we are running? The three-level analysis shows that both of these ideas can be correct. Sometimes the emotion comes first. An unexpected loud noise can cause automatic visceral and behavioral responses that make us flee. Then, the reflective system observes itself fleeing and deduces that it is afraid. The actions of running and fleeing occur first and set off the interpretation of fear.

But sometimes cognition occurs first. Suppose the street where we are walking leads to a dark and narrow section. Our reflective system might conjure numerous imagined threats that await us. At some point, the imagined depiction of potential harm is large enough to trigger the behavioral system, causing us to turn, run, and flee. Here is where the cognition sets off the fear and the action. Most products do not cause fear, running, or fleeing, but badly designed devices can induce frustration and anger, a feeling of helplessness and despair, and possibly even hate. Well-designed devices can induce pride and enjoyment, a feeling of being in con trol and pleasure—possibly even love and attachment. Amuse ment parks are experts at balancing the conflicting responses of the emotional stages, providing rides and fun houses that trigger fear responses from the visceral and behavioral levels, while all the time providing reassurance at the reflective level that the park would never subject anyone to real danger.

All three levels of processing work together to determine a per son’s cognitive and emotional state. High-level reflective cognition can trigger lower-level emotions. Lower-level emotions can trigger higher-level reflective cognition.

The Seven Stages of Action and the Three Levels of Processing

The stages of action can readily be associated with the three differ ent levels of processing, as shown in Figure 2.4. At the lowest level are the visceral levels of calmness or anxiety when approaching a task or evaluating the state of the world. Then, in the middle level, are the behavioral ones driven by expectations on the execution side—for example, hope and fear—and emotions driven by the confirmation of those expectations on the evaluation side—for ex ample, relief or despair. At the highest level are the reflective emo tions, ones that assess the results in terms of the presumed causal agents and the consequences, both immediate and long-term. Here is where satisfaction and pride occur, or perhaps blame and anger. One important emotional state is the one that accompanies com plete immersion into an activity, a state that the social scientist Mihaly Csikszentmihalyi has labeled “flow.” Csikszentmihalyi has long studied how people interact with their work and play, and how their lives reflect this intermix of activities. When in the flow state, people lose track of time and the outside environment.

Levels of Processing and the Stages of the Action Cycle. Visceral response is at the lowest level: the control of simple muscles and sensing the state of the world and body. The behavioral level is about expectations, so it is sen sitive to the expectations of the action sequence and then the interpretations of the feedback. The reflective level is a part of the goal- and plan-set ting activity as well as affected by the comparison of expectations with what has actually happened.

FIGURE . 2

They are at one with the task they are performing. The task, moreover, is at just the proper level of difficulty: difficult enough to provide a challenge and require continued atten tion, but not so difficult that it invokes frustration and anxiety.

Csikszentmihalyi’s work shows how the behavioral level creates a powerful set of emotional responses. Here, the subconscious expectations es tablished by the execution side of the action cycle set up emo tional states dependent upon those expectations. When the results of our actions are eval uated against expectations, the resulting emotions affect our feelings as we continue through the many cycles of action. An easy task, far below our skill level, makes it so easy to meet expectations that there is no challenge. Very little or no processing effort is required, which leads to apathy or boredom. A difficult task, far above our skill, leads to so many failed expectations that it causes frustration, anxiety, and helplessness. The flow state oc curs when the challenge of the activity just slightly exceeds our skill level, so full attention is continually required. Flow requires that the activity be neither too easy nor too difficult relative to our level of skill. The constant tension coupled with continual progress and success can be an engaging, immersive experience sometimes lasting for hours.

People as Storytellers

Now that we have explored the way that actions get done and the three different levels of processing that integrate cognition and emotion, we are ready to look at some of the implications.

People are innately disposed to look for causes of events, to form explanations and stories. That is one reason storytelling is such a persuasive medium. Stories resonate with our experiences and provide examples of new instances. From our experiences and the stories of others we tend to form generalizations about the way people behave and things work. We attribute causes to events, and as long as these cause-and-effect pairings make sense, we accept them and use them for understanding future events. Yet these causal attributions are often erroneous. Sometimes they implicate the wrong causes, and for some things that happen, there is no single cause; rather, a complex chain of events that all contribute to the result: if any one of the events would not have occurred, the result would be different. But even when there is no single causal act, that doesn’t stop people from assigning one. Conceptual models are a form of story, resulting from our predis position to find explanations. These models are essential in helping us understand our experiences, predict the outcome of our actions, and handle unexpected occurrences. We base our models on what ever knowledge we have, real or imaginary, naive or sophisticated. Conceptual models are often constructed from fragmentary evi dence, with only a poor understanding of what is happening, and with a kind of naive psychology that postulates causes, mecha nisms, and relationships even where there are none. Some faulty models lead to the frustrations of everyday life, as in the case of my unsettable refrigerator, where my conceptual model of its opera tion (see again Figure 1.10A) did not correspond to reality (Figure 1.10B). Far more serious are faulty models of such complex sys tems as an industrial plant or passenger airplane. Misunderstand ing there can lead to devastating accidents.

Consider the thermostat that controls room heating and cooling systems. How does it work? The average thermostat offers almost no evidence of its operation except in a highly roundabout man ner. All we know is that if the room is too cold, we set a higher temperature into the thermostat. Eventually we feel warmer. Note that the same thing applies to the temperature control for almost any device whose temperature is to be regulated. Want to bake a cake? Set the oven thermostat and the oven goes to the desired temperature.

If you are in a cold room, in a hurry to get warm, will the room heat more quickly if you turn the thermostat to its maximum set ting? Or if you want the oven to reach its working temperature faster, should you turn the temperature dial all the way to maxi mum, then turn it down once the desired temperature is reached? Or to cool a room most quickly, should you set the air conditioner thermostat to its lowest temperature setting?

If you think that the room or oven will cool or heat faster if the thermostat is turned all the way to the maximum setting, you are wrong—you hold an erroneous folk theory of the heating and cool ing system. One commonly held folk theory of the working of a thermostat is that it is like a valve: the thermostat controls how much heat (or cold) comes out of the device. Hence, to heat or cool something most quickly, set the thermostat so that the device is on maximum. The theory is reasonable, and there exist devices that operate like this, but neither the heating or cooling equipment for a home nor the heating element of a traditional oven is one of them. In most homes, the thermostat is just an on-off switch. Moreover, most heating and cooling devices are either fully on or fully off: all or nothing, with no in-between states. As a result, the thermo stat turns the heater, oven, or air conditioner completely on, at full power, until the temperature setting on the thermostat is reached. Then it turns the unit completely off. Setting the thermostat at one extreme cannot affect how long it takes to reach the desired temperature. Worse, because this bypasses the automatic shutoff when the desired temperature is reached, setting it at the extremes invariably means that the temperature overshoots the target. If people were uncomfortably cold or hot before, they will become uncomfortable in the other direction, wasting considerable energy in the process.

But how are you to know? What information helps you under stand how the thermostat works? The design problem with the refrigerator is that there are no aids to understanding, no way of forming the correct conceptual model. In fact, the information provided misleads people into forming the wrong, quite inap propriate model.

The real point of these examples is not that some people have er roneous beliefs; it is that everyone forms stories (conceptual mod els) to explain what they have observed. In the absence of external information, people can let their imagination run free as long as the conceptual models they develop account for the facts as they perceive them. As a result, people use their thermostats inappro priately, causing themselves unnecessary effort, and often resulting in large temperature swings, thus wasting energy, which is both a needless expense and bad for the environment. (Later in this chap ter, page 69, I provide an example of a thermostat that does pro vide a useful conceptual model.)

Blaming the Wrong Things

People try to find causes for events. They tend to assign a causal re lation whenever two things occur in succession. If some unexpected event happens in my home just after I have taken some action, I am apt to conclude that it was caused by that action, even if there really was no relationship between the two. Similarly, if I do something ex pecting a result and nothing happens, I am apt to interpret this lack of informative feedback as an indication that I didn’t do the action correctly: the most likely thing to do, therefore, is to repeat the action, only with more force. Push a door and it fails to open? Push again, harder. With electronic devices, if the feedback is delayed sufficiently, people often are led to conclude that the press wasn’t recorded, so they do the same action again, sometimes repeatedly, unaware that all of their presses were recorded. This can lead to unintended results. Repeated presses might intensify the response much more than was intended. Alternatively, a second request might cancel the previous one, so that an odd number of pushes produces the desired result, whereas an even number leads to no result.

The tendency to repeat an action when the first attempt fails can be disastrous. This has led to numerous deaths when people tried to escape a burning building by attempting to push open exit doors that opened inward, doors that should have been pulled. As a result, in many countries, the law requires doors in public places to open outward, and moreover to be operated by so-called panic bars, so that they automatically open when people, in a panic to escape a fire, push their bodies against them. This is a great appli cation of appropriate affordances: see the door in Figure 2.5.

Modern systems try hard to provide feedback within 0.1 second of any operation, to reassure the user that the request was received. This is especially important if the operation will take considerable time. The presence of a filling hourglass or rotating clock hands is a reassuring sign that work is in progress. When the delay can be predicted, some systems provide time estimates as well as progress bars to indicate how far along the task has gone. More systems should adopt these sensible displays to provide timely and mean ingful feedback of results.

Some studies show it is wise to underpredict—that is, to say an operation will take longer than it actually will. When the system computes the amount of time, it can compute the range of possible

Panic Bars on Doors. People fleeing a fire would die if they en countered exit doors that opened inward, because they would keep trying to push them outward, and when that failed, they would push harder. The proper design, now required by law in many places, is to change the design of doors so that they open when pushed. Here is one example: an excellent design strategy for dealing with real behavior by the use of the proper affordances coupled with a graceful signifier, the black bar, which indicates where to push. (Photograph by author at the

FIGURE . 2

Ford Design Center, Northwestern University.)

times. In that case it ought to display the range, or if only a single value is desirable, show the slowest, longest value. That way, the expectations are liable to be exceeded, leading to a happy result. When it is difficult to determine the cause of a difficulty, where do people put the blame? Often people will use their own concep tual models of the world to determine the perceived causal rela tionship between the thing being blamed and the result. The word perceived is critical: the causal relationship does not have to exist; the person simply has to think it is there. Sometimes the result is to attribute cause to things that had nothing to do with the action.

Suppose I try to use an everyday thing, but I can’t. Who is at fault: me or the thing? We are apt to blame ourselves, especially if others are able to use it. Suppose the fault really lies in the device, so that lots of people have the same problems. Because everyone perceives the fault to be his or her own, nobody wants to admit to having trouble. This creates a conspiracy of silence, where the feelings of guilt and helplessness among people are kept hidden. Interestingly enough, the common tendency to blame ourselves for failures with everyday objects goes against the normal attribu tions we make about ourselves and others. Everyone sometimes acts in a way that seems strange, bizarre, or simply wrong and inappropriate. When we do this, we tend to attribute our behavior to the environment. When we see others do it, we tend to attribute it to their personalities.

Here is a made-up example. Consider Tom, the office terror. To day, Tom got to work late, yelled at his colleagues because the of fice coffee machine was empty, then ran to his office and slammed the door shut. “Ah,” his colleagues and staff say to one another, “there he goes again.”

Now consider Tom’s point of view. “I really had a hard day,” Tom explains. “I woke up late because my alarm clock failed to go off: I didn’t even have time for my morning coffee. Then I couldn’t find a parking spot because I was late. And there wasn’t any coffee in the office machine; it was all out. None of this was my fault—I had a run of really bad events. Yes, I was a bit curt, but who wouldn’t be under the same circumstances?”

Tom’s colleagues don’t have access to his inner thoughts or to his morning’s activities. All they see is that Tom yelled at them simply because the office coffee machine was empty. This reminds them of another similar event. “He does that all the time,” they conclude, “always blowing up over the most minor things.” Who is correct? Tom or his colleagues? The events can be seen from two differ ent points of view with two different interpretations: common re sponses to the trials of life or the result of an explosive, irascible personality.

It seems natural for people to blame their own misfortunes on the environment. It seems equally natural to blame other people’s misfortunes on their personalities. Just the opposite attribution, by the way, is made when things go well. When things go right, peo ple credit their own abilities and intelligence. The onlookers do the reverse. When they see things go well for someone else, they sometimes credit the environment, or luck.

In all such cases, whether a person is inappropriately accepting blame for the inability to work simple objects or attributing be havior to environment or personality, a faulty conceptual model is at work.

LEARNED HELPLESSNESS

The phenomenon called learned helplessness might help explain the self-blame. It refers to the situation in which people experience re peated failure at a task. As a result, they decide that the task cannot be done, at least not by them: they are helpless. They stop trying. If this feeling covers a group of tasks, the result can be severe diffi culties coping with life. In the extreme case, such learned helpless ness leads to depression and to a belief that the individuals cannot cope with everyday life at all. Sometimes all it takes to get such a feeling of helplessness are a few experiences that accidentally turn out bad. The phenomenon has been most frequently studied as a precursor to the clinical problem of depression, but I have seen it happen after a few bad experiences with everyday objects.

Do common technology and mathematics phobias result from a kind of learned helplessness? Could a few instances of failure in what appear to be straightforward situations generalize to ev ery technological object, every mathematics problem? Perhaps. In fact, the design of everyday things (and the design of mathematics courses) seems almost guaranteed to cause this. We could call this phenomenon taught helplessness.

When people have trouble using technology, especially when they perceive (usually incorrectly) that nobody else is having the same problems, they tend to blame themselves. Worse, the more they have trouble, the more helpless they may feel, believing that they must be technically or mechanically inept. This is just the op posite of the more normal situation where people blame their own difficulties on the environment. This false blame is especially ironic because the culprit here is usually the poor design of the technol ogy, so blaming the environment (the technology) would be com pletely appropriate.

Consider the normal mathematics curriculum, which continues relentlessly on its way, each new lesson assuming full knowledge and understanding of all that has passed before. Even though each point may be simple, once you fall behind it is hard to catch up. The result: mathematics phobia—not because the material is diffi cult, but because it is taught so that difficulty in one stage hinders further progress. The problem is that once failure starts, it is soon generalized by self-blame to all of mathematics. Similar processes are at work with technology. The vicious cycle starts: if you fail at something, you think it is your fault. Therefore you think you can’t do that task. As a result, next time you have to do the task, you believe you can’t, so you don’t even try. The result is that you can’t, just as you thought.

You’re trapped in a self-fulfilling prophecy.

POSITIVE PSYCHOLOGY

Just as we learn to give up after repeated failure, we can learn op timistic, positive responses to life. For years, psychologists focused upon the gloomy story of how people failed, on the limits of hu man abilities, and on psychopathologies—depression, mania, para noia, and so on. But the twenty-first century sees a new approach:

to focus upon a positive psychology, a culture of positive thinking, of feeling good about oneself. In fact, the normal emotional state of most people is positive. When something doesn’t work, it can be considered an interesting challenge, or perhaps just a positive learning experience.

We need to remove the word failure from our vocabulary, replac ing it instead with learning experience. To fail is to learn: we learn more from our failures than from our successes. With success, sure, we are pleased, but we often have no idea why we succeeded. With failure, it is often possible to figure out why, to ensure that it will never happen again.

Scientists know this. Scientists do experiments to learn how the world works. Sometimes their experiments work as expected, but often they don’t. Are these failures? No, they are learning expe riences. Many of the most important scientific discoveries have come from these so-called failures.

Failure can be such a powerful learning tool that many designers take pride in their failures that happen while a product is still in development. One design firm, IDEO, has it as a creed: “Fail often, fail fast,” they say, for they know that each failure teaches them a lot about what to do right. Designers need to fail, as do research ers. I have long held the belief—and encouraged it in my students and employees—that failures are an essential part of exploration and creativity. If designers and researchers do not sometimes fail, it is a sign that they are not trying hard enough—they are not think ing the great creative thoughts that will provide breakthroughs in how we do things. It is possible to avoid failure, to always be safe. But that is also the route to a dull, uninteresting life. The designs of our products and services must also follow this philosophy. So, to the designers who are reading this, let me give some advice:

Falsely Blaming Yourself

I have studied people making errors—sometimes serious ones— with mechanical devices, light switches and fuses, computer op erating systems and word processors, even airplanes and nuclear power plants. Invariably people feel guilty and either try to hide the error or blame themselves for “stupidity” or “clumsiness.” I often have difficulty getting permission to watch: nobody likes to be observed performing badly. I point out that the design is faulty and that others make the same errors, yet if the task appears sim ple or trivial, people still blame themselves. It is almost as if they take perverse pride in thinking of themselves as mechanically incompetent.

I once was asked by a large computer company to evaluate a brand-new product. I spent a day learning to use it and trying it out on various problems. In using the keyboard to enter data, it was necessary to differentiate between the Return key and the En ter key. If the wrong key was pressed, the last few minutes’ work was irrevocably lost.

I pointed out this problem to the designer, explaining that I, myself, had made the error frequently and that my analyses indi cated that this was very likely to be a frequent error among users. The designer’s first response was: “Why did you make that error? Didn’t you read the manual?” He proceeded to explain the differ ent functions of the two keys.

“Yes, yes,” I explained, “I understand the two keys, I simply confuse them. They have similar functions, are located in similar locations on the keyboard, and as a skilled typist, I often hit Return automatically, without thought. Certainly others have had similar problems.” “Nope,” said the designer. He claimed that I was the only per son who had ever complained, and the company’s employees had been using the system for many months. I was skeptical, so we went together to some of the employees and asked them whether they had ever hit the Return key when they should have hit Enter. And did they ever lose their work as a result?

“Oh, yes,” they said, “we do that a lot.” Well, how come nobody ever said anything about it? After all, they were encouraged to report all problems with the system. The reason was simple: when the system stopped working or did some thing strange, they dutifully reported it as a problem. But when they made the Return versus Enter error, they blamed themselves. After all, they had been told what to do. They had simply erred. The idea that a person is at fault when something goes wrong is deeply entrenched in society. That’s why we blame others and even ourselves. Unfortunately, the idea that a person is at fault is imbed ded in the legal system. When major accidents occur, official courts of inquiry are set up to assess the blame. More and more often the blame is attributed to “human error.” The person involved can be fined, punished, or fired. Maybe training procedures are revised. The law rests comfortably. But in my experience, human error usually is a result of poor design: it should be called system error. Humans err continually; it is an intrinsic part of our nature. System design should take this into account. Pinning the blame on the person may be a comfortable way to proceed, but why was the system ever de signed so that a single act by a single person could cause calamity? Worse, blaming the person without fixing the root, underlying cause does not fix the problem: the same error is likely to be repeated by someone else. I return to the topic of human error in Chapter 5.

Of course, people do make errors. Complex devices will always require some instruction, and someone using them without in struction should expect to make errors and to be confused. But designers should take special pains to make errors as cost-free as possible. Here is my credo about errors:

Eliminate the term human error. Instead, talk about communica tion and interaction: what we call an error is usually bad commu nication or interaction. When people collaborate with one anoth er, the word error is never used to characterize another person’s utterance. That’s because each person is trying to understand and respond to the other, and when something is not understood or seems inappropriate, it is questioned, clarified, and the collab oration continues. Why can’t the interaction between a person and a machine be thought of as collaboration? Machines are not people. They can’t communicate and under stand the same way we do. This means that their designers have a special obligation to ensure that the behavior of machines is un derstandable to the people who interact with them. True collabo ration requires each party to make some effort to accommodate and understand the other. When we collaborate with machines, it is people who must do all the accommodation. Why shouldn’t the machine be more friendly? The machine should accept normal hu man behavior, but just as people often subconsciously assess the accuracy of things being said, machines should judge the quality of information given it, in this case to help its operators avoid griev ous errors because of simple slips (discussed in Chapter 5). Today, we insist that people perform abnormally, to adapt themselves to the peculiar demands of machines, which includes always giving precise, accurate information. Humans are particularly bad at this, yet when they fail to meet the arbitrary, inhuman requirements of machines, we call it human error. No, it is design error. Designers should strive to minimize the chance of inappro priate actions in the first place by using affordances, signifiers, good mapping, and constraints to guide the actions. If a person performs an inappropriate action, the design should maximize the chance that this can be discovered and then rectified. This requires good, intelligible feedback coupled with a simple, clear conceptual model. When people understand what has happened, what state the system is in, and what the most appropriate set of actions is, they can perform their activities more effectively.

People are not machines. Machines don’t have to deal with continual interruptions. People are subjected to continual inter ruptions. As a result, we are often bouncing back and forth be tween tasks, having to recover our place, what we were doing, and what we were thinking when we return to a previous task. No wonder we sometimes forget our place when we return to the original task, either skipping or repeating a step, or imprecisely retaining the information we were about to enter. Our strengths are in our flexibility and creativity, in coming up with solutions to novel problems. We are creative and imaginative, not mechanical and precise. Machines require precision and accu racy; people don’t. And we are particularly bad at providing precise and accurate inputs. So why are we always required to do so? Why do we put the requirements of machines above those of people? When people interact with machines, things will not always go smoothly. This is to be expected. So designers should antici pate this. It is easy to design devices that work well when every thing goes as planned. The hard and necessary part of design is to make things work well even when things do not go as planned.

HOW TECHNOLOGY CAN ACCOMMODATE HUMAN BEHAVIOR

In the past, cost prevented many manufacturers from providing useful feedback that would assist people in forming accurate conceptual models. The cost of color displays large and flexible enough to provide the required information was prohibitive for small, inexpensive devices. But as the cost of sensors and displays has dropped, it is now possible to do a lot more. Thanks to display screens, telephones are much easier to use than ever before, so my extensive criticisms of phones found in the earlier edition of this book have been removed. I look forward to great im provements in all our devices now that the importance of these de sign principles are becoming recognized and the enhanced quality and lower costs of displays make it possible to implement the ideas.

P ROV I DI NG A C ONC E P T UA L MODE L F OR A HOM E T H E R MO S TAT

My thermostat, for example (designed by Nest Labs), has a colorful display that is normally off, turning on only when it senses that I

FIGURE . 2

A Thermostat with an Explicit Concep

tual Model. This thermostat, manufactured by Nest Labs, helps people form a good conceptual model of its opera tion. Photo A shows the thermostat. The background, blue, indicates that it is now cooling the home. The current tem perature is 75°F (24°C) and the target temperature is 72°F (22°C), which it expects to reach in 20 minutes. Photo B shows its use of a smart phone to deliver a summary of its settings and the home’s energy use. Both A and B combine to help the home dweller develop conceptual models of the thermostat and the home’s energy consumption. (Pho

tographs courtesy of Nest Labs, Inc.)

B.

am nearby. Then it provides me with the current temperature of the room, the temperature to which it is set, and whether it is heat ing or cooling the room (the background color changes from black when it is neither heating nor cooling, to orange while heating, or to blue while cooling). It learns my daily patterns, so it changes temperature automatically, lowering it at bedtime, raising it again in the morning, and going into “away” mode when it detects that nobody is in the house. All the time, it explains what it is doing. Thus, when it has to change the room temperature substantially (either because someone has entered a manual change or because it has decided that it is now time to switch), it gives a prediction: “Now 75°, will be 72° in 20 minutes.” In addition, Nest can be con nected wirelessly to smart devices that allow for remote operation of the thermostat and also for larger screens to provide a detailed analysis of its performance, aiding the home occupant’s develop ment of a conceptual model both of Nest and also of the home’s en ergy consumption. Is Nest perfect? No, but it marks improvement in the collaborative interaction of people and everyday things.

E N T E R I NG DAT E S, T I M E S, A N D T E L E P HON E N U M BE R S

Many machines are programmed to be very fussy about the form of input they require, where the fussiness is not a requirement of the machine but due to the lack of consideration for people in the design of the software. In other words: inappropriate program ming. Consider these examples.

Many of us spend hours filling out forms on computers—forms that require names, dates, addresses, telephone numbers, mone tary sums, and other information in a fixed, rigid format. Worse, often we are not even told the correct format until we get it wrong. Why not figure out the variety of ways a person might fill out a form and accommodate all of them? Some companies have done excellent jobs at this, so let us celebrate their actions. Consider Microsoft’s calendar program. Here, it is possible to specify dates any way you like: “November 23, 2015,” “23 Nov. 15,” or “11.23.15.” It even accepts phrases such as “a week from Thursday,” “tomorrow,” “a week from tomorrow,” or “yesterday.” Same with time. You can enter the time any way you want: “3:45 PM,” “15.35,” “an hour,” “two and one-half hours.” Same with telephone numbers: Want to start with a + sign (to indicate the code for international dialing)? No problem. Like to separate the num ber fields with spaces, dashes, parentheses, slashes, periods? No problem. As long as the program can decipher the date, time, or telephone number into a legal format, it is accepted. I hope the team that worked on this got bonuses and promotions.

Although I single out Microsoft for being the pioneer in accept ing a wide variety of formats, it is now becoming standard prac tice. By the time you read this, I would hope that every program would permit any intelligible format for names, dates, phone num bers, street addresses, and so on, transforming whatever is entered into whatever form the internal programming needs. But I predict that even in the twenty-second century, there will still be forms that require precise accurate (but arbitrary) formats for no reason except the laziness of the programming team. Perhaps in the years that pass between this book’s publication and when you are read ing this, great improvements will have been made. If we are all lucky, this section will be badly out of date. I hope so.

The Seven Stages of Action: Seven Fundamental Design Principles

The seven-stage model of the action cycle can be a valuable de sign tool, for it provides a basic checklist of questions to ask. In general, each stage of action requires its own special design strate gies and, in turn, provides its own opportunity for disaster. Figure 2.7 summarizes the questions:

  1. What do I want to accomplish?
  2. What are the alternative action sequences?
  3. What action can I do now?
  4. How do I do it?
  5. What happened?
  6. What does it mean?
  7. Is this okay? Have I accomplished my goal?

Anyone using a product should always be able to determine the answers to all seven questions. This puts the burden on the designer

F I G U R E

The Seven Stages of Action as Design

2 . 7.

Aids. Each of the seven stages indicates a place where the person using the system has a question. The seven questions pose seven design themes. How should the design con vey the information required to answer the user’s question? Through appropriate con straint and mappings, signi fiers and conceptual models, feedback and visibility. The information that helps answer questions of execution (doing) is feedforward. The information that aids in understanding what has happened is feedback.

to ensure that at each stage, the product provides the information required to answer the question.

The information that helps answer questions of execution (do ing) is feedforward. The information that aids in understanding what has happened is feedback. Everyone knows what feedback is. It helps you know what happened. But how do you know what you can do? That’s the role of feedforward, a term borrowed from control theory.

Feedforward is accomplished through appropriate use of signi fiers, constraints, and mappings. The conceptual model plays an important role. Feedback is accomplished through explicit infor mation about the impact of the action. Once again, the conceptual model plays an important role.

Both feedback and feedforward need to be presented in a form that is readily interpreted by the people using the system. The presenta tion has to match how people view the goal they are trying to achieve and their expectations. Information must match human needs. The insights from the seven stages of action lead us to seven fun damental principles of design:

  1. Discoverability. It is possible to determine what actions are possible and the current state of the device.
  2. Feedback. There is full and continuous information about the results
  3. Conceptual model. The design projects all the information needed to create a good conceptual model of the system, leading to under- standing and a feeling of control. The conceptual model enhances both discoverability and evaluation of results.
  4. Affordances. The proper affordances exist to make the desired ac- tions possible.
  5. Signifiers. Effective use of signifiers ensures discoverability and that the feedback is well communicated and intelligible.
  6. Mappings. The relationship between controls and their actions fol- lows the principles of good mapping, enhanced as much as possible through spatial layout and temporal contiguity.

7. Constraints. Providing physical, logical, semantic, and cultural con straints guides actions and eases interpretation.

The next time you can’t immediately figure out the shower con trol in a hotel room or have trouble using an unfamiliar television set or kitchen appliance, remember that the problem is in the de sign. Ask yourself where the problem lies. At which of the seven stages of action does it fail? Which design principles are deficient? But it is easy to find fault: the key is to be able to do things better. Ask yourself how the difficulty came about. Realize that many different groups of people might have been involved, each of which might have had intelligent, sensible reasons for their ac tions. For example, a troublesome bathroom shower was designed by people who were unable to know how it would be installed, then the shower controls might have been selected by a building contractor to fit the home plans provided by yet another person. Finally, a plumber, who may not have had contact with any of the other people, did the installation. Where did the problems arise? It could have been at any one (or several) of these stages. The result may appear to be poor design, but it may actually arise from poor communication.

One of my self-imposed rules is, “Don’t criticize unless you can do better.” Try to understand how the faulty design might have occurred: try to determine how it could have been done otherwise. Thinking about the causes and possible fixes to bad design should make you better appreciate good design. So, the next time you come across a well-designed object, one that you can use smoothly and effortlessly on the first try, stop and examine it. Consider how well it masters the seven stages of action and the principles of de sign. Recognize that most of our interactions with products are ac tually interactions with a complex system: good design requires consideration of the entire system to ensure that the requirements, intentions, and desires at each stage are faithfully understood and respected at all the other stages.

Survey Research in HCI

Hendrik Müller , Aaron Sedley , and Elizabeth Ferrall-Nunge

Short Description of the Method

A survey is a method of gathering information by asking questions to a subset of people, the results of which can be generalized to the wider target population. There are many different types of surveys, many ways to sample a population, and many ways to collect data from that population. Traditionally, surveys have been admin istered via mail, telephone, or in person. The Internet has become a popular mode for surveys due to the low cost of gathering data, ease and speed of survey adminis tration, and its broadening reach across a variety of populations worldwide. Surveys in human–computer interaction (HCI) research can be useful to:

H. Müller (*) Google Australia Pty Ltd. , Level 5, 48 Pirrama Road , Pyrmont , NSW 2009 , Australia e-mail: hendrik82@gmail.com A. Sedley Google, Inc. , 1600 Amphitheatre Parkway , Mountain View , CA 94043 , USA e-mail: asedley@gmail.com E. Ferrall-Nunge Twitter, Inc. , 1355 Market Street, Suite 900 , San Francisco , CA 94103 , USA e-mail: enunge@gmail.com

J.S. Olson and W.A. Kellogg (eds.), Ways of Knowing in HCI, DOI 10.1007/978-1-4939-0378-8_10, © Springer Science+Business Media New York 2014

229

While powerful for specifi c needs, surveys do not allow for observation of the respondents’ context or follow-up questions. When conducting research into pre cise behaviors, underlying motivations, and the usability of systems, then other research methods may be more appropriate or needed as a complement.

This chapter reviews the history of surveys and appropriate uses of surveys and focuses on the best practices in survey design and execution.

History, Intellectual Tradition, Evolution

Since ancient times, societies have measured their populations via censuses for food planning, land distribution, taxation, and military conscription. Beginning in the nineteenth century, political polling was introduced in the USA to project election results and to measure citizens’ sentiment on a range of public policy issues. At the emergence of contemporary psychology, Francis Galton pioneered the use of ques tionnaires to investigate the nature vs. nurture debate and differences between humans, the latter of which evolved into the fi eld of psychometrics (Clauser, 2007 ). More recently, surveys have been used in HCI research to help answer a variety of questions related to people’s attitudes, behaviors, and experiences with technology.

Though nineteenth-century political polls amplifi ed public interest in surveys, it was not until the twentieth century that meaningful progress was made on survey sampling methods and data representativeness. Following two incorrect predictions of the US presidential victors by major polls (Literary Digest for Landon in 1936 and Gallup for Dewey in 1948), sampling methods were assailed for misrepresent ing the US electorate. Scrutiny of these polling failures; persuasive academic work by statisticians such as Kiaer, Bowley, and Neyman; and extensive experimentation by the US Census Bureau led to the acceptance of random sampling as the gold standard for surveys (Converse, 1987 ).

Roughly in parallel, social psychologists aimed to minimize questionnaire biases and optimize data collection. For example, in the 1920s and 1930s, Louis Thurstone and Rensis Likert demonstrated reliable methods for measuring attitudes (Edwards & Kenney, 1946 ); Likert’s scaling approach is still widely used by survey practitio ners. Stanley Payne’s, 1951 classic “The Art of Asking Questions” was an early study of question wording. Subsequent academics scrutinized every aspect of survey design. Tourangeau ( 1984 ) articulated the four cognitive steps to survey responses, noting that people have to comprehend what is asked, retrieve the appropriate infor mation, judge that information according to the question, and map the judge ment onto the provided responses. Krosnick & Fabrigar ( 1997 ) studied many components of questionnaire design, such as scale length, text labels, and “no opinion” responses. Groves ( 1989 ) identifi ed four types of survey-related error: coverage, sampling, measurement, and non-response. As online surveys grew in popularity, Couper ( 2008 ) and others studied bias from the visual design of Internet questionnaires.

The use of surveys for HCI research certainly predates the Internet, with efforts to understand users’ experiences with computer hardware and software. In 1983, researchers at Carnegie Mellon University conducted an experiment comparing

1824 1876

1920s-30s

1948

1983 1984

Fig. 1

1989

1990s

Summary of the key stages in survey history

computer-collected survey responses with those from a printed questionnaire, fi nding less socially desirable responses in the digital survey and longer open ended responses than in the printed questionnaire (Kiesler & Sproull, 1986 ). With the popularization of graphical user interfaces in the 1980s, surveys joined other methods for usability research. Several standardized questionnaires were devel oped to assess usability (e.g., SUS, QUIS, SUMI, summarized later in this chap ter). Surveys are a direct means of measuring satisfaction; along with effi ciency and effectiveness, satisfaction is a pillar of the ISO 9241, part 11, defi nition of usability (Abran et al., 2003 ). User happiness is fundamental to Google’s HEART framework for user-centric measurement of Web applications (Rodden, Hutchinson, & Fu, 2010 ). In 1994, the Georgia Institute of Technology started annual online surveys to understand Internet usage and users and to explore Web-based survey research (Pitkow & Recker, 1994 ). As the Internet era progressed, online applica tions widely adopted surveys to measure users’ satisfaction, unaddressed needs, and problems experienced, in addition to user profi ling. See a summary of key stages in survey history in Fig. 1 .

What Questions the Method Can Answer

When used appropriately, surveys can help inform application and user research strategies and provide insights into users’ attitudes, experiences, intents, demo graphics, and psychographic characteristics. However, surveys are not the most appropriate method for many other HCI research goals. Ethnographic interviews, log data analysis, card sorts, usability studies, and other methods may be more appropriate. In some cases, surveys can be used with other research methods to holistically inform HCI development. This section explains survey appropriateness, when to avoid using surveys, as well as how survey research can complement other research methods.

When Surveys Are Appropriate

Overall, surveys are appropriate when needing to represent an entire population, to measure differences between groups of people, and to identify changes over time in people’s attitudes and experiences. Below are examples of how survey data can be used in HCI research.

Surveys can accurately measure and reliably represent attitudes and perceptions of a population. While qualitative studies are able to gather attitudinal data, surveys provide statistically reliable metrics, allowing researchers to bench mark attitudes toward an application or an experience, to track changes in attitudes over time, and to tie self-reported attitudes to actual behavior (e.g., via log data). For example, surveys can be used to measure customer satisfaction with online banking immediately following their experiences.

A ttitudes.

I ntent . Surveys can collect peoples’ reasons for using an application at a specifi c time, allowing researchers to gauge the frequency across different objectives. Unlike other methods, surveys can be deployed while a person is actually using an applica tion (i.e., an online intercept survey), minimizing the risk of imperfect recall on the respondent’s part. Note that specifi c details and the context of one’s intent may not be fully captured in a survey alone. For example, “Why did you visit this website?” could be answered in a survey, but qualitative research may be more appropriate in determining how well one understood specifi c application elements and what users’ underlying motivations are in the context of their daily lives.

Task success . Similar to measuring intent, while HCI researchers can qualita tively observe task success through a lab or a fi eld study, a survey can be used to reliably quantify levels of success. For example, respondents can be instructed to perform a certain task, enter results of the task, and report on their experiences while performing the task.

User experience feedback . Collecting open-ended feedback about a user’s expe rience can be used to understand the user’s interaction with technology or to inform system requirements and improvements. For example, by understanding the relative frequency of key product frustrations and benefi ts, project stakeholders can make informed decisions and trade-offs when allocating resources.

User characteristics . Surveys can be used to understand a system’s users and to better serve their needs. Researchers can collect users’ demographic informa tion, technographic details such as system savviness or overall tech savviness, and psychographic variables such as openness to change and privacy orienta tion. Such data enables researchers to discover natural segments of users who may have different needs, motivations, attitudes, perceptions, and overall user experiences.

I nteractions with technology . Surveys can be used to understand more broadly how people interact with technology and how technology infl uences social interactions with others by asking people to self-report on social, psychological, and demographic variables while capturing their behaviors. Through the use of surveys, HCI researchers can glean insights into the effects technology has on the general population.

A wareness . Surveys can also help in understanding people’s awareness of existing technologies or specifi c application features. Such data can, for example, help researchers determine whether low usage with an application is a result of poor awareness or other factors, such as usability issues. By quantifying how aware or unaware people are, researchers can decide whether efforts (e.g., marketing cam paigns) are needed to increase overall awareness and thus use.

Comparisons . Surveys can be used to compare users’ attitudes, perceptions, and experiences across user segments, time, geographies, and competing applications and between experimental and control versions. Such data enable researchers to explore whether user needs and experiences vary across geographies, assess an application’s strengths and weaknesses among competing technologies and how each compares with their competitors’ applications, and evaluate potential application improvements while aiding decision making between a variety of proposed designs.

When to Avoid Using a Survey

Because surveys are inexpensive and easy to deploy compared to other methods, many people choose survey research even when it is inappropriate for their needs. Such surveys can produce invalid or unreliable data, leading to an inaccurate understanding of a population and poor user experiences. Below are some HCI research needs that are better addressed with other methods.

Precise behaviors . While respondents can be asked to self-report their behaviors, gathering this information from log data, if available, will always be more accurate. This is particularly true when trying to understand precise user behaviors and fl ows, as users will struggle to recall their exact sequence of clicks or specifi c pages visited. For behaviors not captured in log data, a diary study, observational study, or experience sampling may gather more accurate results than a survey.

Underlying motivations . People often do not understand or are unable to explain why they take certain actions or prefer one thing over another. Someone may be able to report their intent in a survey but may not be aware of their subconscious motivations for specifi c actions. Exploratory research methods such as ethnography or contextual inquiry may be more appropriate than directly asking about underlying motivations in a survey.

Usability evaluations . Surveys are inappropriate for testing specifi c usability tasks and understanding of tools and application elements. As mentioned above, surveys can measure task success but may not explain why people cannot use a particular application, why they do not understand some aspect of a product, or why they do not identify missteps that caused the task failure. Furthermore, a user may still be able to complete a given task even though he or she encountered several confusions, which could not be uncovered through a survey. Task-based observational research and inter view methods, such as usability studies, are better suited for such research goals.

Fig. 2

Using Surveys with Other Methods

Survey research may be especially benefi cial when used in conjunction with other research methods (see Fig. 2 ). Surveys can follow previous qualitative studies to help quantify specifi c observations. For many surveys, up-front qualitative research may even be required to inform its content if no previous research exists. On the other hand, surveys can also be used to initially identify high-level insights that can be followed by in-depth research through more qualitative (meaning smaller sam ple) methods.

For example, if a usability study uncovers a specifi c problem, a survey can quantify the frequency of that problem across the population. Or a survey can be used fi rst to identify the range of frustrations or goals, followed by qualitative interviews and observational research to gain deeper insights into self-reported behaviors and sources of frustration. Researchers may interview survey respon dents to clarify responses (e.g., Yew, Shamma, & Churchill, 2011 ), interview another pool of participants in the same population for comparison (e.g., Froelich et al., 2012 ), or interview both survey respondents and new participants (e.g., Archambault & Grudin, 2012 ).

Surveys can also be used in conjunction with A/B experiments to aid compara tive evaluations. For example, when researching two different versions of an appli cation, the same survey can be used to assess both. By doing this, differences in variables such as satisfaction and self-reported task success can be measured and analyzed in parallel with behavioral differences observed in log data. Log data may show that one experimental version drives more traffi c or engagement, but the sur vey may show that users were less satisfi ed or unable to complete a task. Moreover, log data can further validate insights from a previously conducted survey. For exam ple, a social recommendation study by Chen, Geyer, Dugan, Muller, and Guy ( 2009 ) tested the quality of recommendations fi rst in a survey and then through logging in a large fi eld deployment. Psychophysiological data may be another objective accompaniment to survey data. For example, game researchers have combined sur veys with data such as facial muscle and electrodermal activity (Nacke, Grimshaw, & Lindley, 2010 ) or attention and meditation as measured with EEG sensors (Schild, LaViola, & Masuch, 2012 ).

Survey Research in HCI

How to Do It: What Constitutes Good Work

This section breaks down survey research into the following six stages: 1. Research goals and constructs 2. Population and sampling 3. Questionnaire design and biases 4. Review and survey pretesting 5. Implementation and launch 6. Data analysis and reporting

R esearch Goals and Constructs

235

Before writing survey questions, researchers should fi rst think about what they intend to measure, what kind of data needs to be collected, and how the data will be used to meet the research goals. When the survey-appropriate research goals have been identifi ed, they should be matched to constructs , i.e., unidimensional attributes that cannot be directly observed. The identifi ed constructs should then be converted into one or multiple survey questions. Constructs can be identifi ed from prior primary research or literature reviews. Asking multiple questions about the same construct and analyzing the responses, e.g., through factor analysis, may help the researcher ensure the construct’s validity.

An example will illustrate the process of converting constructs into questions. An overarching research goal may be to understand users’ happiness with an online application, such as Google Search, a widely used Web search engine. Since happi ness with an application is often multidimensional, it is important to separate it into measurable pieces—its constructs. Prior research might indicate that constructs such as “overall satisfaction,” “perceived speed,” and “perceived utility” contribute to users’ happiness with that application. When all the constructs have been identifi ed, survey questions can be designed to measure each. To validate each construct, it is important to evaluate its unique relationship with the higher level goal, using correla tion, regression, factor analysis, or other methods. Furthermore, a technique called cognitive pretesting can be used to determine whether respondents are interpreting the constructs as intended by the researcher (see more details in the pretesting section).

Once research goals and constructs are defi ned, there are several other consider ations to help determine whether a survey is the most appropriate method and how to proceed:

• Do the survey constructs focus on results which will directly address research goals and inform stakeholders’ decision making rather than providing merely informative data? An excess of “nice-to-know” questions increases survey length and the likelihood that respondents will not complete the questionnaire, dimin ishing the effectiveness of the survey results.

Fig. 3

The relationship between population, sampling frame, sample, and respondents

P opulation and Sampling

Key to effective survey research is determining who and how many people to survey. In order to do this, the survey’s p opulation , or set of individuals that meet certain criteria, and to whom researchers wish to generalize their results must fi rst be defi ned. Reaching everyone in the population (i.e., a census) is typically impossible and unnecessary. Instead, researchers approximate the true population by creating a sampling frame , i.e., the set of people who the researcher is able to contact for the survey. The perfect sampling frame is identical to the population, but often a survey’s sampling frame is only a portion of the population. The people from the sampling frame who are invited to take the survey are the sample , but only those who answer are respondents . See Fig. 3 illustrating these different groups.

For example, a survey can be deployed to understand the satisfaction of a prod uct’s or an application’s users. In this case, the population includes everyone that uses the application, and the sampling frame consists of users that are actually reachable. The sampling frame may exclude those who have abandoned the applica tion, anonymous users, and users who have not opted in to being contacted for research. Though the sampling frame may exclude many users, it could still include far more people than are needed to collect a statistically valid number of responses. However, if the sampling frame systematically excludes certain types of people (e.g., very dissatisfi ed or disengaged users), the survey will suffer from coverage error and its responses will misrepresent the population.

Probability Versus Non-probability Sampling

Sampling a population can be accomplished through probability- and non probability- based methods. P robability or random sampling is considered the gold standard because every person in the sampling frame has an equal, nonzero chance of being chosen for the sample; essentially, the sample is selected completely ran domly. This minimizes sampling bias , also known as selection bias , by randomly drawing the sample from individuals in the sampling frame and by inviting every one in the sample in the same way. Examples of probability sampling methods include random digit telephone dialing, address-based mail surveys utilizing the US Postal Service Delivery Sequence File (DSF), and the use of a panel recruited through random sampling, those who have agreed in advance to receive surveys. For Internet surveys in particular, methods allowing for random sampling include inter cept surveys for those who use a particular product (e.g., pop-up surveys or in product links), list-based samples (e.g., for e-mail invitations), and pre-recruited probability-based panels (see Couper, 2000 , for a thorough review). Another way to ensure probability sampling is to use a preexisting sampling frame, i.e., a list of candidates previously assembled using probability sampling methods. For example, Shklovski, Kraut, and Cummings’ ( 2008 ) study of the effect of residential moves on communication with friends was drawn from a publicly available, highly relevant sampling frame, the National Change of Address (NCOA) database. Another approach is to analyze selected subsets of data from an existing representative survey like the General Social Survey (e.g., Wright & Randall, 2012 ).

While probability sampling is ideal, it is often impossible to reach and randomly select from the entire target population, especially when targeting small populations (e.g., users of a specialized enterprise product or experts in a particular fi eld) or investigating sensitive or rare behavior. In these situations, researchers may use non-probability sampling methods such as volunteer opt-in panels, unrestricted self-selected surveys (e.g., links on blogs and social networks), snowball recruiting (i.e., asking for friends of friends), and convenience samples (i.e., targeting people readily available, such as mall shoppers) (Couper, 2000 ). However, non-probability methods are prone to high sampling bias and hence reduce representativeness compared to random sampling. One way representativeness can be assessed is by comparing key characteristics of the target population with those from the actual sample (for more details, refer to the analysis section).

Many academic surveys use convenience samples from an existing pool of the university’s psychology students. Although not representative of most Americans, this type of sample is appropriate for investigating technology behavior among young people such as sexting (Drouin & Landgraff, 2012 ; Weisskirch & Delevi, 2011 ), instant messaging (Anandarajan, Zaman, Dai, & Arinze, 2010 ; Junco & Cotten, 2011 ; Zaman et al., 2010 ), and mobile phone use (Auter, 2007 ; Harrison, 2011 ; Turner, Love, & Howell, 2008 ). Convenience samples have also been used to identify special populations. For example, because identifying HIV and tuberculosis patients through offi cial lists of names is diffi cult because of patient confi dentiality, one study about the viability of using cell phones and text messages in HIV and tuberculosis education handed out surveys to potential respondents in health clinic waiting rooms (Person, Blain, Jiang, Rasmussen, & Stout, 2011 ). Similarly, a study of Down’s syndrome patients’ use of computers invited participation through spe cial interest listservs (Feng, Lazar, Kumin, & Ozok, 2010 ).

Determining the Appropriate Sample Size

No matter which sampling method is used, it is important to carefully determine the target sample size for the survey, i.e., the number of survey responses needed. If the sample size is too small, fi ndings from the survey cannot be accurately generalized to the population and may fail to detect generalizable differences between groups. If the sample is larger than necessary, too many individuals are burdened with tak ing the survey, analysis time for the researcher may increase, or the sampling frame is used up too quickly. Hence, calculating the optimal sample size becomes crucial for every survey.

First, the researcher needs to determine approximately how many people make up the population being studied. Second, as the survey does not measure the entire population, the required level of precision must be chosen, which consists of the margin of error and the confi dence level. The margin of error expresses the amount of sampling error in the survey, i.e., the range of uncertainty around an estimate of a population measure, assuming normally distributed data. For example, if 60 % of the sample claims to use a tablet computer, a 5 % margin of error would mean that actu ally 55–65 % of the population use tablet computers. Commonly used margin of errors are 5 and 3 %, but depending on the goals of the survey anywhere between 1 and 10 % may be appropriate. Using a margin of error higher than 10 % is not recommended, unless a low level of precision can meet the survey’s goals. The con fi dence level indicates how likely the reported metric falls within the margin of error if the study were repeated. A 95 % confi dence level, for example, would mean that 95 % of the time, observations from repeated sampling will fall within the interval defi ned by the margin of error. Commonly used confi dence levels are 99, 95, and 90 %; using less than 90 % is not recommended.

There are various formulas for calculating the target sample size. Figure 4 , based on Krejcie and Morgan’s formula (1970), shows the appropriate sample size, given the population size, as well as the chosen margin of error and confi dence level for your survey. Note that the table is based on a population proportion of 50 % for the response of interest, the most cautious estimation (i.e., when higher or lower than 50 %, the required sample size declines to achieve the same margin of error). For example, for a population larger than 100,000, a sample size of 384 is required to achieve a confi dence level of 95 % and a margin of error of 5 %. Note that for popu lation sizes over about 20,000, the required sample size does not signifi cantly increase. Researchers may set the sample size to 500 to estimate a single population parameter, which yields a margin of error of about ±4.4 % at a 95 % confi dence level for large populations.

After having determined the target sample size for the survey, the researcher now needs to work backwards to estimate the number of people to actually invite to the

Confidence level Margin of Size of error population 10 10% 9 90% 5% 10 90% 3% 10 1% 10 10% 9 95% 5% 10 95% 3% 10 1% 10 10% 9 99% 5% 10 99% 3% 10 1% 10
100 41 73 88 99 49 80 92 99 63 87 95 99
1000 63 213 429 871 88 278 516 906 142 399 648 943
10,000 67 263 699 4035 95 370 964 4899 163 622 1556 6239
100,000 68 270 746 6335 96 383 1056 8762 166 659 1810 14227
1,000,000 68 270 751 6718 96 384 1066 9512 166 663 1840 16317
100,000,000 68 271 752 6763 96 384 1067 9594 166 663 1843 16560

Fig. 4

Sample size as a function of population size and accuracy (confi dence level and margin of error)

survey, taking into account the estimated size for each subgroup and the expected response rate. If a subgroup’s incidence is very small, the total number of invitations must be increased to ensure the desired sample size for this subgroup. The response rate of a survey describes the percentage of those who completed the survey out of all those that were invited (for more details, see the later sections on monitoring survey paradata and maximizing response rates). If a similar survey has been conducted before, then its response rate is a good reference point for calculating the required sample size. If there is no prior response rate information, the survey can be sent out to a small number of people fi rst to measure the response rate, which is then used to deter mine the total number of required invitations.

For example, assuming a 30 % response rate, a 50 % incidence rate for the group of interest, and the need for 384 complete responses from that group, 2,560 people should be invited to the survey. At this point, the calculation may determine that the researcher may require a sample that is actually larger than the sampling frame; hence, the researcher may need to consider more qualitative methods as an alternative.

Mode and Methods of Survey Invitation

To reach respondents, there are four basic survey modes: mail or written surveys, phone surveys, face-to-face or in-person surveys, and Internet surveys. Survey modes may also be used in combination. The survey mode needs to be chosen care fully as each mode has its own advantages and disadvantages, such as differences in typical response rates, introduced biases (Groves, 1989 ), required resources and costs, audience that can be reached, and respondents’ level of anonymity.

Today, many HCI-related surveys are Internet based, as benefi ts often outweigh their disadvantages. Internet surveys have the following major advantages:

Internet surveys also have several disadvantages. The most discussed downside is the introduction of coverage error , i.e., a potential mismatch between the target population and the sampling frame (Couper, 2000 ; Groves, 1989 ). For example, online surveys fail to reach people without Internet or e-mail access. Furthermore, those invited to Internet surveys may be less motivated to respond or to provide accurate data because such surveys are less personal and can be ignored more easily. This survey mode also relies on the respondents’ ability to use a computer and may only provide the researcher with minimal information about the survey respondents. (See chapter on “Crowdsourcing in HCI Research.”)

Questionnaire Design and Biases

Upon establishing the constructs to be measured and the appropriate sampling method, the fi rst iteration of the survey questionnaire can be designed. It is important to care fully think through the design of each survey question (fi rst acknowledged by Payne, 1951 ), as it is fairly easy to introduce biases that can have a substantial impact on the reliability and validity of the data collected. Poor questionnaire design may introduce measurement error , defi ned as the deviation of the respondents’ answers from their true values on the measure. According to Couper ( 2000 ), measurement error in self administered surveys can arise from the respondent (e.g., lack of motivation, compre hension problems, deliberate distortion) or from the instrument (e.g., poor wording or design, technical fl aws). In most surveys, there is only one opportunity to deploy, and unlike qualitative research, no clarifi cation or probing is possible. For these reasons, it is crucial that the questions accurately measure the constructs of interest.

Going forward, this section covers different types of survey questions, common questionnaire biases, questions to avoid, visual design considerations, reuse of established questionnaires, as well as visual survey design considerations.

Types of Survey Questions

There are two categories of survey questions—open- and closed-ended questions. Open-ended questions (Fig. 5 ) ask survey respondents to write in their own answers, whereas closed-ended questions (Fig. 6 ) provide a set of predefi ned answers to choose from.

What, if anything, do you find frustrating about your smartphone?

Fig. 5

Fig. 6

Example of a typical open-ended question

Overall, how satisfied or dissatisfied are you with your smartphone?

Example of a typical closed-ended question, a bipolar rating question in particular

Open-ended questions are appropriate when:

Fig. 7

Fig. 8

What is the highest level of education you have completed?

Example of a single-choice question

Which of the following apps do you use daily on your smartphone?

Example of a multiple-choice question

Types of Closed-Ended Survey Questions

There are four basic types of closed-ended questions: single-choice, multiple- choice, rating, and ranking questions.

  1. Single-choice questions work best when only one answer is possible for each respondent in the real world (Fig. 7 ) .
  2. M ultiple-choice questions are appropriate when more than one answer may apply to the respondent. Frequently, multiple-choice questions are accompanied by “select all that apply” help text. The maximum number of selections may also be specifi ed to force users to prioritize or express preferences among the answer
  3. R anking questions are best when respondents must prioritize their choices given a real-world situation (Fig. 9 ) .
  4. R ating questions are appropriate when the respondent must judge an object on a continuum. To optimize reliability and minimize bias, scale points need to be

Rank the following smartphone manufacturers in order of your preference:

Add a number to each row, 1 being the least preferred, 5 being the most preferred.

Fig. 9

Example of a ranking question

How important is it to you to make phone calls from your smartphone?

Fig. 10

fully labeled instead of using numbers (Groves et al., 2004 ), and each scale point should be of equal width to avoid bias toward visually bigger response options (Tourangeau, Couper, & Conrad, 2004 ). Rating questions should use either a unipolar or a bipolar scale, depending on the construct being measured (Krosnick & Fabrigar, 1997 ; Schaeffer & Presser, 2003 ).

Unipolar constructs range from zero to an extreme amount and do not have a natural midpoint. They are best measured with a 5-point rating scale (Krosnick & Fabrigar, 1997 ), which optimizes reliability while minimizing respondent burden, and with the following scale labels, which have been shown to be semantically equidistant from each other (Rohrmann, 2003 ): “Not at all …,” “Slightly …,” “Moderately …,” “Very …,” and “Extremely ….” Such constructs include impor tance (see Fig. 10 ), interest, usefulness, and relative frequency. B ipolar constructs range from an extreme negative to an extreme positive with a natural midpoint. Unlike unipolar constructs, they are best measured with a 7-point rating scale to maximize reliability and data differentiation (Krosnick & Fabrigar, 1997 ). Bipolar constructs may use the following scale labels: “Extremely …,” “Moderately …,” “Slightly …,” “Neither … nor …,” “Slightly …,” “Moderately …,” and “Extremely ….” Such constructs include satisfaction (see Fig. 6 , from dissatisfi ed to satisfi ed), perceived speed (from slow to fast), ease of use (from diffi cult to easy), and visual appeal (from unappealing to appealing).

When using a rating scale, the inclusion of a midpoint should be considered. While some may argue that including a midpoint provides an easy target for res pondents who shortcut answering questions, others argue that the exclusion of a midpoint forces people who truly are in the middle to choose an option that does not refl ect their actual opinion. O’Muircheartaigh, Krosnick, and Helic ( 2001 ) found that having a midpoint on a rating scale increases reliability, has no effect on valid ity, and does not result in lower data quality. Additionally, people who look for shortcuts (“shortcutters”) are not more likely to select the midpoint when present. Omitting the midpoint, on the other hand, increases the amount of random measure ment error, resulting in those who actually feel neutral to end up making a random choice on either side of the scale. These fi ndings suggest that a midpoint should be included when using a rating scale.

Questionnaire Biases

After writing the fi rst survey draft, it is crucial to check the phrasing of each ques tion for potential biases that may bias the responses. The following section covers fi ve common questionnaire biases: satisfi cing, acquiescence bias, social desirability, response order bias, and question order bias.

Satisfi cing

Satisfi cing occurs when respondents use a suboptimal amount of cognitive effort to answer questions. Instead, satisfi cers will typically pick what they consider to be the

acceptable response alternative (Krosnick, 1991 ; Simon, 1956 ). Satisfi cers fi rst compromise one or more of the following four cognitive steps for survey response as identifi ed by Tourangeau ( 1984 ):

1. Comprehension of the question, instructions, and answer options 2. Retrieval of specifi c memories to aid with answering the question 3. Judgement of the retrieved information and its applicability to the question 4. Mapping of judgement onto the answer options

Satisfi cers shortcut this process by exerting less cognitive effort or by skipping one or more steps entirely; satisfi cers use less effort to understand the question, to thoroughly search their memories, to carefully integrate all retrieved information, or to accurately pick the proper response choice (i.e., they pick the next best choice).

Satisfi cing can take weak and strong forms (Krosnick, 1999 ). Weak satisfi cers make an attempt to answer correctly yet are less than thorough, while strong satisfi - cers may not at all search their memory for relevant information and simply select answers at random in order to complete the survey quickly. In other words, weak satisfi cers carelessly process all four cognitive steps, while strong satisfi cers typi cally skip the retrieval and judgement steps.

Respondents are more likely to satisfi ce when (Krosnick, 1991 ): • Cognitive ability to answer is low. • Motivation to answer is low. • Question diffi culty is high at one of the four stages, resulting in cognitive exertion.

Survey Research in HCI

245

Using the same rating scale in a series of back-to-back questions should be

avoided. Potential satisfi ers may pick the same scale point for all answer options.

This is known as straight-lining or item non-differentiation (Herzog & Bachman, 1981 ; Krosnick & Alwin, 1987 , 1988 ).

• Long questionnaires should be avoided, since respondents will be less likely to optimally answer questions when they become increasingly fatigued and unmo tivated (Cannell & Kahn, 1968 ; Herzog & Bachman, 1981 ).

• Respondent motivation can be increased by explaining the importance of the survey

topic and that their responses are critical to the researcher (Krosnick, 1991 ).

Acquiescence Bias

When presented with agree/disagree, yes/no, or true/false statements, some respon dents are more likely to concur with the statement independent of its substance. This tendency is known as acquiescence bias (Smith, 1967 ).

Respondents are more likely to acquiescence when:

To minimize acquiescence bias, the following may be considered:

H. Müller et al.

Social Desirability

Social desirability occurs when respondents answer questions in a manner they feel will be positively perceived by others ( Goffman, 1959 ; Schlenker & Weigold, 1989 ). Favorable actions may be overreported, and unfavorable actions or views may be underreported. Topics that are especially prone to social desirability bias include voting behavior, religious beliefs, sexual activity, patriotism, bigotry, intel lectual capabilities, illegal acts, acts of violence, and charitable acts.

Respondents are inclined to provide socially desirable answers when:

To minimize social desirability bias, respondents should be allowed to answer anonymously or the survey should be self-administered (Holbrook & Krosnick, 2010 ; Tourangeau & Smith, 1996 ; Tourangeau & Yan, 2007 ).

Response Order Bias

Response order bias is the tendency to select the items toward the beginning (i.e., primacy effect) or the end (i.e., recency effect) of an answer list or scale (Chan, 1991 ; Krosnick & Alwin, 1987 ; Payne, 1971 ). Respondents unconsciously interpret the ordering of listed answer options and assume that items near each other are related, top or left items are interpreted to be “fi rst,” and middle answers in a scale without a natural order represent the typical value (Tourangeau et al., 2004 ). Primacy and recency effects are the strongest when the list of answer options is long (Schuman & Presser, 1981 ) or when they cannot be viewed as a whole (Couper et al., 2004 ).

• Unrelated answer options should be randomly ordered across respondents (Krosnick & Presser, 2010 ).

item fi rst.

Rating scales should be ordered from negative to positive, with the most negative

Question Order Bias

Order effects also apply to the order of the questions in surveys. Each question in a survey has the potential to bias each subsequent question by priming respondents (Kinder & Iyengar, 1987 ; Landon, 1971 ).

The following guidelines may be considered:

Other Types of Questions to Avoid

Beyond the fi ve common questionnaire biases mentioned above, there are additional question types that can result in unreliable and invalid survey data. These include broad, leading, double-barreled, recall, prediction, hypothetical, and prioritization questions.

B road questions lack focus and include items that are not clearly defi ned or those that can be interpreted in multiple ways. For example, “Describe the way you use your tablet computer” is too broad, as there are many aspects to using a tablet such as the purpose, applications being used, and its locations of use. Instead of relying on the respondent to decide on which aspects to report, the research goal as well as core construct(s) should be determined beforehand and asked about in a focused manner. A more focused set of questions for the example above could be “Which apps did you use on your tablet computer over the last week?” and “Describe the locations in which you used your tablet computer last week?”.

248

H. Müller et al.

L eading

questions manipulate respondents into giving a certain answer by providing biasing content or suggesting information the researcher is looking to have confi rmed. For example, “This application was recently ranked as number one in customer satisfaction. How satisfi ed are you with your experience today?”. Another way that questions can lead the respondent toward a certain answer includes those that ask the respondent to agree or disagree with a given statement, as for example in “Do you agree or disagree with the following statement: I use my smartphone more often than my tablet computer.” Note that such questions can additionally result in acquiescence bias (as discussed above). To minimize the effects of leading ques tions, questions should be asked in a fully neutral way without any examples or additional information that may bias respondents toward a particular response.

D ouble-barreled questions ask about multiple items while only allowing for a single response, resulting in less reliable and valid data. Such questions can usually be detected by the existence of the word “and.” For example, when asked “How satis fi ed or dissatisfi ed are you with your smartphone and tablet computer?”, a respondent with differing attitudes toward the two devices will be forced to pick an attitude that either refl ects just one device or the average across both devices. Questions with multiple items should be broken down into one question per construct or item.

R ecall questions require the respondent to remember past attitudes and behav iors, leading to recall bias (Krosnick & Presser, 2010 ) and inaccurate recollections. When a respondent is asked “How many times did you use an Internet search engine over the past 6 months?”, they will try to rationalize a plausible number, because recalling a precise count is diffi cult or impossible. Similarly, asking questions that compare past attitudes to current attitudes, as in “Do you prefer the previous or current version of the interface?”, may result in skewed data due to diffi culty remembering past attitudes. Instead, questions should focus on the present, as in “How satisfi ed or dissatisfi ed are you with your smartphone today?”, or use a recent time frame, for example, “In the past hour, how many times did you use an Internet search engine?”. If the research goal is to compare attitudes or behaviors across different product versions or over time, the researcher should fi eld separate surveys for each product version or time period and make the comparison themselves.

Prediction questions ask survey respondents to anticipate future behavior or atti tudes, resulting in biased and inaccurate responses. Such questions include “Over the next month, how frequently will you use an Internet search engine?”. Even more cognitively burdensome are hypothetical questions, i.e., asking the respondent to imagine a certain situation in the future and then predicting their attitude or behav ior in that situation. For example, “Would you purchase more groceries if the store played your favorite music?” and “How much would you like this Website if it used blue instead of red for their color scheme?” are hypothetical questions. Other fre quently used hypothetical questions are those that ask the respondent to prioritize a future feature set, as in “Which of the following features would make you more satisfi ed with this product?”. Even though the respondent may have a clear answer to this question, their response does not predict actual future usage of or satisfaction with the product if that feature was added. Such questions should be entirely excluded from surveys.

Leveraging Established Questionnaires

An alternative to constructing a brand new questionnaire is utilizing questionnaires developed by others. These usually benefi t from prior validation and allow research ers to compare results with other studies that used the same questionnaire. When selecting an existing questionnaire, one should consider their particular research goals and study needs and adapt the questionnaire as appropriate. Below are com monly used HCI-related questionnaire instruments. Note that as survey research methodology has signifi cantly advanced over time, each questionnaire should be assessed for potential sources of measurement error, such as the biases and the to-be- avoided question types mentioned previously.

colorfulness, and craftsmanship (Moshagen & Thielsch, 2010 ).

visual aesthetics of a Website on the four subscales of simplicity, diversity,

Visual Survey Design Considerations

Researchers should also take into account their survey’s visual design, since specifi c

choices, including the

use of images, spacing, and progress bars, may unintention

ally bias respondents. details, refer to Couper ( 2008 ).

This section summarizes such visual design aspects; for more

While objective images (e.g., product screenshots) can help clarify questions, context-shaping images can infl uence a respondent’s mindset. For example, when asking respondents to rate their level of health, presenting an image of someone in a hospital bed has a framing effect that results in higher health ratings compared to that of someone jogging (Couper, Conrad, & Tourangeau, 2007 ).

The visual treatment of response options also matters. When asking closed- ended questions, uneven spacing between horizontal scale options results in a higher selection rate for scale points with greater spacing; evenly spaced scale options are recommended (Tourangeau, Couper, & Conrad, 2004 ). Drop-down lists, compared to radio buttons, have been shown to be harder and slower to use and to result in more accidental selections (Couper, 2011 ). Lastly, larger text fi elds increase the amount of text entered (Couper, 2011 ) but may intimidate respondents, potentially causing higher break-offs (i.e., drop-out rates).

Survey questions can be presented one per page, multiple per page, or all on one page. Research into pagination effects on completion rates is inconclusive (Couper, 2011 ). However, questions appearing on the same page may have higher correlations with each other, a sign of measurement bias (Peytchev, Couper, McCabe, & Crawford, 2006 ). In practice, most Internet surveys with skip logic use multiple pages, whereas very short questionnaires are often presented on a single page.

While progress bars are generally preferred by respondents and are helpful for short surveys, their use in long surveys or surveys with skip logic can be misleading and intimidating. Progress between pages in long surveys may be small, resulting in increased break-off rates (Callegaro, Villar, & Yang, 2011 ). On the other hand, progress bars are likely to increase completion rates for short surveys, where sub stantial progress is shown between pages.

R eview and Survey Pretesting

At this point in the survey life cycle, it is appropriate to have potential respondents take and evaluate the survey in order to identify any remaining points of confusion. For example, the phrase “mobile device” may be assumed to include mobile phones, tab lets, and in-car devices by the researcher, while survey respondents may interpret it to be mobile phones only. Or, when asking for communication tools used by the respon dent, the provided list of answer choices may not actually include all possible options needed to properly answer the question. Two established evaluation methods used to improve survey quality are cognitive pretesting and fi eld testing the survey by launching it to a subset of the actual sample, as described more fully in the remainder of this sec tion. By evaluating surveys early on, the researcher can identify disconnects between their own assumptions and how respondents will read, interpret, and answer questions.

Cognitive Pretesting

To conduct a cognitive pretest, a small set of potential respondents is invited to participate in an in-person interview where they are asked to take the survey while using the think-aloud protocol (similar to a usability study). A cognitive pretest assesses question interpretation, construct validity, and comprehension of survey terminology and calls attention to missing answer options or entire questions ( Bolton & Bronkhorst, 1995 ; Collins, 2003 ; Drennan, 2003 ; Presser et al., 2004 ). However, note that due to the testing environment, a cognitive pretest does not allow the researcher to understand contextual infl uences that may result in break-off or not fi lling out the survey in the fi rst place.

As part of a pretest, participants are asked the following for each question: 1. “Read the entire question and describe it in your own words.” 2. “Select or write an answer while explaining your thought process.” 3. “Describe any confusing terminology or missing answer choices.”

During the interview, the researcher should observe participant reactions; identify misinterpretations of terms, questions, answer choices, or scale items; and gain insight into how respondents process questions and come up with their answers. The researcher then needs to analyze the collected information to improve problem atic areas before fi elding the fi nal questionnaire. A questionnaire could go through several rounds of iteration before reaching the desired quality.

Field Testing

Piloting the survey with a small subset of the sample will help provide insights that cognitive pretests alone cannot (Collins, 2003 ; Presser et al., 2004 ). Through fi eld testing, the researcher can assess the success of the sampling approach, look for common break-off points and long completion times, and examine answers to open ended questions. High break-off rates and completion times may point to fl aws in the survey design (see the following section), while unusual answers may suggest a disconnect between a question’s intention and respondents’ interpretation. To yield additional insights from the fi eld test, a question can be added at the end of each page or at the end of the entire survey where respondents can provide explicit feed back on any points of confusion. Similar to cognitive pretests, fi eld testing may lead to several rounds of questionnaire improvement as well as changes to the sampling method. Finally, once all concerns are addressed, the survey is ready to be fi elded to the entire sample.

I mplementation and Launch

When all questions are fi nalized, the survey is ready to be fi elded based on the chosen sampling method. Respondents may be invited through e-mails to specifi cally named persons (e.g., respondents chosen from a panel), intercept pop-up dialogs while using a product or a site, or links placed directly in an application (see the sampling section for more details; Couper, 2000 ).

There are many platforms and tools that can be used to implement Internet surveys, such as Confi rmIt, Google Forms, Kinesis, LimeSurvey, SurveyGizmo, SurveyMonkey, UserZoom, Wufoo, and Zoomerang, to name just a few. When deciding on the appropriate platform, functionality, cost, and ease of use should be taken into consideration. The questionnaire may require a survey tool that supports functionality such as branching and conditionals, the ability to pass URL parame ters, multiple languages, and a range of question types. Additionally, the researcher may want to customize the visual style of the survey or set up an automatic reporting dashboard, both of which may only be available on more sophisticated platforms.

Piping Behavioral Data into Surveys

Some platforms support the ability to combine survey responses with other log data, which is referred to as piping. Self-reported behaviors, such as frequency of use, feature usage, tenure, and platform usage, are less valid and reliable compared to generating the same metrics through log data. By merging survey responses with behavioral data, the researcher can more accurately understand the relationship between respondent characteristics and their behaviors or attitudes. For example, the researcher may fi nd that certain types of users or the level of usage may correlate with higher reported satisfaction. Behavioral data can either be passed to the results database as a parameter in the survey invitation link or combined later via a unique identifi er for each respondent.

Monitoring Survey Paradata

With the survey’s launch, researchers should monitor the initial responses as well as survey paradata to identify potential mistakes in the survey design. Survey paradata is data collected about the survey response process, such as the devices from which the survey was accessed, time to survey completion, and various response-related rates. By monitoring such metrics, the survey researcher can quickly apply improvements before the entire sample has responded to the survey. The American Association for Public Opinion Research specifi ed a set of defi nitions for commonly used paradata metrics (AAPOR, 2011 ):

Response rates are dependent on a variety of factors, the combination of which makes it diffi cult to specify an acceptable response rate in HCI survey research. A meta-analysis of 31 e-mail surveys from 1986 to 2000 showed that average response rates for e-mail surveys typically fall between 30 and 40 %, with follow-up reminders signifi cantly increasing response rates (Sheehan, 2001 ). Another review of 69 e-mail surveys showed that response rates averaged around 40 % (Cook, Heath, & Thompson, 2000 ). When inviting respondents through Internet intercept surveys (e.g., pop-up surveys or in-product links), response rates may be 15 % or lower (Couper, 2000 ). Meta-analyses of mailed surveys showed that their response rates are 40–50 % (Kerlinger, 1986 ) or 55 % (Baruch, 1999 ). In experimental com parisons to mailed surveys, response rates to Internet e-mail surveys were about 10 % lower (Kaplowitz, Hadlock, & Levine, 2004 ; Manfreda et al., 2008 ). Such meta reviews also showed that overall response rates have been declining over several decades (Baruch, 1999 ; Baruch & Holtom, 2008 ; Sheehan, 2001 ); however, this decline seems to have stagnated around 1995 (Baruch & Holtom, 2008 ).

Maximizing Response Rates

In order to gather enough responses to represent the target population with the desired level of precision, response rates should be maximized. Several factors affect response rates, including the respondents’ interest in the subject matter, the perceived impact of responding to the survey, questionnaire length and diffi culty, the presence and nature of incentives, and researchers’ efforts to encourage response (Fan & Yan, 2010 ).

Based on experimentation with invitation processes for mail surveys, Dillman ( 1978 ) developed the “Total Design Method” to optimize response rates. This method, consistently achieving response rates averaging 70 % or better, consists of a timed sequence of four mailings: the initial request with the survey on week one, a reminder postcard on week two, a replacement survey to non-respondents on week four, and a second replacement survey to non-respondents by certifi ed mail on week seven. Dillman incorporates social exchange theory into the Total Design Method by person alizing the invitation letters, using offi cial stationery to increase trust in the survey’s sponsorship, explaining the usefulness of the survey research and the importance of responding, assuring the confi dentiality of respondents’ data, and beginning the ques tionnaire with items directly related to the topic of the survey (1991). Recognizing the need to cover Internet and mixed-mode surveys, Dillman extended his prior work with the “Tailored Design Method.” With this update, he emphasized customizing processes and designs to fi t each survey’s topic, population, and sponsorship (2007).

Another component of optimizing response rates is getting as many complete responses as possible from those who start the survey. According to Peytchev ( 2009 ), causes of break-off may fall into the following three categories:

The questionnaire design principles mentioned previously may help minimize break-off, such as making surveys as short as possible, having a minimum of required questions, using skip logic, and including progress bars for short surveys.

Providing an incentive to encourage survey responses may be advantageous in certain cases. Monetary incentives tend to increase response rates more than non monetary incentives (Singer, 2002 ). In particular, non-contingent incentives, which are offered to all people in the sample, generally outperform contingent incentives, given only upon completion of the survey (Church, 1993 ). This is true even when a non-contingent incentive is considerably smaller than a contingent incentive. One strategy to maximize the benefi t of incentives is to offer a small non-contingent award to all invitees, followed by a larger contingent award to initial non- respondents (Lavrakas, 2011 ). An alternate form of contingent incentive is a lottery, where a drawing is held among respondents for a small number of monetary awards or other prizes. However, the effi cacy of such lotteries is unclear (Stevenson, Dykema, Cyffka, Klein, & Goldrick-Rab, 2012 ). Although incentives will typically increase response rates, it is much less certain whether they increase the representativeness of the results. Incentives are likely most valuable when facing a small population or sampling frame, and high response rates are required for suffi ciently precise mea surements. Another case where incentives may help is when some groups in the sample have low interest in the survey topic (Singer, 2002 ). Furthermore, when there is a cost to contact each potential respondent, as with door-to-door interview ing, incentives will decrease costs by lowering the number of people that need to be contacted.

D ata Analysis and Reporting

Once all the necessary survey responses have been collected, it is time to start making sense of the data by:

  1. Preparing and exploring the data
  2. Thoroughly analyzing the data
  3. Synthesizing insights for the target audience of this research

Data Preparation and Cleaning

Cleaning and preparing survey data before conducting a thorough analysis are essential to identify low-quality responses that may otherwise skew the results. When taking a pass through the data, survey researchers should look for signs of poor-quality responses. Such survey data can either be left as is, removed, or presented separately from trusted data. If the researcher decides to remove poor data, they must cautiously decide whether to remove data on the respondent level (i.e., listwise deletion), an individual question level (i.e., pairwise deletion), or only beyond a certain point in the survey where respondents’ data quality is declined. The following are signals that survey researchers should look out for at the survey response level:

• D uplicate responses . In a self-administered survey, a respondent might be able to fi ll out the survey more than once. If possible, respondent information such as name, e-mail address, or any other unique identifi er should be used to remove duplicate responses.

• Speeders . Respondents that complete the survey faster than possible, speeders, may have carelessly read and answered the questions, resulting in arbitrary responses. The researcher should examine the distribution of response times and remove any respondents that are suspiciously fast.

• Straight-liners and other questionable patterns . Respondents that always, or almost always, pick the same answer option across survey questions are referred to as straight-liners. Grid-style questions are particularly prone to respondent straight-lining (e.g., by always picking the fi rst answer option when asked to rate a series of objects). Respondents may also try to hide the fact that they are ran domly choosing responses by answering in a fi xed pattern (e.g., by alternating between the fi rst and second answer options across questions). If a respondent straight-lines through the entire survey, the researcher may decide to remove the respondent’s data entirely. If a respondent starts straight-lining at a certain point, the researcher may keep data up until that point.

• M issing data and break-offs . Some respondents may fi nish a survey but skip sev eral questions. Others may start the survey but break off at some point. Both result in missing data. It should fi rst be determined whether those who did not respond to certain questions are different from those who did. A non-response study should be conducted to assess the amount of non-response bias for each survey question. If those who did not answer certain questions are not meaningfully different from those who did, the researcher can consider leaving the data as is; however, if there is a difference, the researcher may choose to impute plausible values based on similar respondents’ answers (De Leeuw, Hox, & Huisman, 2003 ).

Furthermore, the following signals may need to be assessed at a question by-question level:

Analysis of Closed-Ended Responses

To get an overview of what the survey data shows, descriptive statistics are funda mental. By looking at measures such as the frequency distribution, central tendency (e.g., mean or median), and data dispersion (e.g., standard deviation), emerging patterns can be uncovered. The frequency distribution shows the propor tion of responses for each answer option. The central tendency measures the “central” position of a frequency distribution and is calculated using the mean, median, and mode. Dispersion examines the data spread around the central position through calculations such as standard deviation, variance, range, and interquartile range.

While descriptive statistics only describe the existing data set, inferential statis tics can be used to draw inferences from the sample to the overall population in question. Inferential statistics consists of two areas: estimation statistics and hypoth esis testing. Estimation statistics involves using the survey’s sample in order to approximate the population’s value. Either the margin of error or the confi dence interval of the sample’s data needs to be determined for such estimation. To calcu late the margin of error for an answer option’s proportion, only the sample size, the proportion, and a selected confi dence level are needed. However, to determine the confi dence interval for a mean, the standard error of the mean is required addition ally. A confi dence interval thus represents the estimated range of a population’s mean at a certain confi dence level.

Hypothesis testing determines the probability of a hypothesis being true when comparing groups (e.g., means or proportions being the same or different) through the use of methods such as t -test, ANOVA, or Chi-square. The appropriate test is determined by the research question, type of prediction by the researcher, and type of variable (i.e., nominal, ordinal, interval, or ratio). An experienced quantita tive researcher or statistician should be involved.

Inferential statistics can also be applied to identify connections among variables: • B ivariate correlations are widely used to assess linear relationships between variables. For example, correlations can indicate which product dimensions (e.g., ease of use, speed, features) are most strongly associated with users’ over all satisfaction.

• L inear regression analysis indicates the proportion of variance in a continuous dependent variable that is explained by one or more independent variables and the amount of change explained by each unit of an independent variable.

Analysis of Open-Ended Comments

In addition to analyzing closed-ended responses, the review of open-ended com ments contributes a more holistic understanding of the phenomena being studied. Analyzing a large set of open-ended comments may seem like a daunting task at fi rst; however, if done correctly, it reveals important insights that cannot otherwise be extracted from closed-ended responses. The analysis of open-ended survey responses can be derived from the method of grounded theory (Böhm, 2004 ; Glaser & Strauss, 1967 ) (see chapter on “Grounded Theory Methods”).

An interpretive method, referred to as coding (Saldaña, 2009 ), is used to organize and transform qualitative data from open-ended questions to enable further quanti tative analysis (e.g., preparing a frequency distribution of the codes or comparing the responses across groups). The core of such qualitative analysis is to assign one or several codes to each comment; each code consists of a word or a short phrase summarizing the essence of the response with regard to the objective of that survey question (e.g., described frustrations, behavior, sentiment, or user type). Available codes are chosen from a coding scheme, which may already be established by the community or from previous research or may need to be created by the researchers themselves. In most cases, as questions are customized to each individual survey, the researcher needs to establish the coding system using a deductive or an inductive approach.

When employing a deductive approach, the researcher defi nes the full list of possible codes in a top-down fashion; i.e., all codes are defi ned before reviewing the qualitative data and assigning those codes to comments. On the other hand, when using an inductive approach to coding, the codes are generated and constantly revised in a bottom-up approach; i.e., the data is coded according to categories by reading and re-reading responses to the open-ended question. Bottom-up, inductive coding is recommended, as it has the benefi t of capturing categories the researcher may not have thought of before reading the actual comments; however, it requires more coordination if multiple coders are involved. (See “Grounded Theory Method” chapter for an analogous discussion.)

To measure the reliability of both the developed coding system and the coding of the comments, either the same coder should partially repeat the coding or a second coder should be involved. I ntra-rater reliability describes the degree of agreement when the data set is reanalyzed by the same researcher. I nter-rater reliability (Armstrong, Gosling, Weinman, & Marteau, 1997 ; Gwet, 2001 ) determines the agree ment level of the coding results from at least two independent researchers (using correlations or Cohen’s kappa). If there is low agreement, the coding needs to be reviewed to identify the pattern behind the disagreement, coder training needs to be adjusted, or changes to codes need to be agreed upon to achieve consistent categoriza tion. If the data set to be coded is too large and coding needs to be split up between researchers, inter-rater consistency can be measured by comparing results from coding an overlapping set of comments, by comparing the coding to a preestablished standard, or by including another researcher to review overlapping codes from the main coders.

After having analyzed all comments, the researcher may prepare descriptive statistics such as a frequency distribution of codes, conduct inferential statistical tests, summarize key themes, prepare necessary charts, and highlight specifi cs through the use of representative quotes. To compare results across groups, infer ential analysis methods can be used as described above for closed-ended data (e.g., t -tests, ANOVA, or Chi-square).

identifi ed

Assessing Representativeness

A key criterion in any survey’s quality is the degree to which the results accurately represent the target population. If a survey’s sampling frame fully covers the popu lation and the sample is randomly drawn from the sampling frame, a response rate of 100 % would ensure that the results are representative at a level of precision based on the sample size.

If, however, a survey has less than a 100 % response rate, those not responding might have provided a different answer distribution than those who did respond.

An example is a survey intended to measure attitudes and behaviors regarding a technology that became available recently. Since people who are early adopters of new technologies are usually very passionate about providing their thoughts and feedback, surveying users of this technology product would overestimate responses from early adopters (as compared to more occasional users) and the incidence of favorable attitudes toward that product. Thus, even a modest level of non-response can greatly affect the degree of non-response bias.

With response rates to major longitudinal surveys having decreased over time, much effort has been devoted to understanding non- response and its impact on data

face surveys.

quality as well as methods of adjusting results to mitigate non-response error. Traditional survey assumptions held that maximizing response rates minimized non-response bias (Groves, 2006 ). Therefore, the results of Groves’ 2006 meta analysis were both surprising and seminal, fi nding no meaningful correlation between response rates and non-response error across mail, telephone, and face-to

Reporting Survey Findings

Once the question-by-question analysis is completed, the researcher needs to synthesize fi ndings across all questions to address the goals of the survey. Larger themes may be identifi ed, and the initially defi ned research questions are answered, which are in turn translated into recommendations and broader HCI implications as appropriate. All calculations used for the data analysis should be reported with the necessary statistical rigor (e.g., sample sizes, p -values, margins of error, and confi dence levels). Furthermore, it is important to list the survey’s paradata and include response and break-off rates (see section on monitoring survey paradata).

Similar to other empirical research, it is important to not only report the results of the survey but also describe the original research goals and the used survey methodology. A detailed description of the survey methodology will explain the population being studied, sampling method, survey mode, survey invitation, fi eld ing process, and response paradata. It should also include screenshots of the actual survey questions and explain techniques used to evaluate data quality. Furthermore, it is often necessary to include a discussion on how the respondents compare to the overall population. Lastly, any potential sources of survey bias, such as sampling biases or non-response bias, should be outlined.

Exercises

  1. What are the differences between a survey and a questionnaire, both in concept and design?
  2. In your own research area, create a survey and test it with fi ve classmates. How long do you think it will take a classmate to fi ll it out? How long did it take them?

Acknowledgements We would like to thank our employers Google, Inc. and Twitter, Inc. for making it possible for us to work on this chapter. There are many that contributed to this effort, and we would like to call out the most signifi cant ones: Carolyn Wei for identifying published papers that used survey methodology for their work, Sandra Lozano for her insights on analysis, Mario Callegaro for inspiration, Ed Chi and Robin Jeffries for reviewing several drafts of this document, and Professors Jon Krosnick from Stanford University and Mick Couper from the University of Michigan for laying the foundation of our survey knowledge and connecting us to the broader survey research community.

References

Overview Books

Couper, M. (2008). D esigning effective Web surveys . Cambridge, UK: Cambridge University Press. Fowler, F. J., Jr. (1995). I mproving survey questions: Design and evaluation (Vol. 38). Thousand

Oaks, CA: Sage. Incorporated.

Groves, R. M. (1989). Survey errors and survey costs . Hoboken, NJ: Wiley. Groves, R. M. (2004). Survey errors and survey costs (Vol. 536). Hoboken, NJ: Wiley-Interscience. Groves, R. M., Fowler, F. J., Couper, M. P., Lepkowski, J. M., Singer, E., & Tourangeau, R. (2004).

Survey methodology . Hoboken, NJ: Wiley.

Marsden, P. V., & Wright, J. (Eds.). (2010). H andbook of survey research (2nd ed.). Bingley, UK:

Emerald Publishing Group Limited.

Sampling Methods

Aquilino, W. S. (1994). Interview mode effects in surveys of drug and alcohol use: A fi eld experiment.

Public Opinion Quarterly., 58 (2), 210–240.

Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York, NY: Wiley. Couper, M. P. (2000). Web surveys: A review of issues and approaches. Public Opinion Quarterly,

64 , 464–494.

Kish, L. (1965). Survey sampling . New York, NY: Wiley. Krejcie, R. V., & Morgan, D. W. (1970). Determining sample size for research activities.

E ducational and Psychological Measurement, 30 , 607–610.

Lohr, S. L. (1999). Sampling: Design and analysis . Pacifi c Grove, CA: Duxbury Press.

Questionnaire Design

Bradburn, N. M., Sudman, S., & Wansink, B. (2004). A sking questions: The defi nitive guide to questionnaire design – for market research, political polls, and social and health question naires . San Francisco, CA: Jossey-Bass. Revised. Cannell, C. F., & Kahn, R. L. (1968). Interviewing. The Handbook of Social Psychology, 2 ,

526–595.

Chan, J. C. (1991). Response-order effects in Likert-type scales. E ducational and Psychological

Measurement, 51 (3), 531–540.

Costa, P. T., & McCrae, R. R. (1988). From catalog to classifi cation: Murray’s needs and the fi ve

factor model. J ournal of Personality and Social Psychology, 55 (2), 258.

Couper, M. P., Tourangeau, R., Conrad, F. G., & Crawford, S. D. (2004). What they see is what we

get response options for web surveys. Social Science Computer Review, 22 (1), 111–127.

Edwards, A. L., & Kenney, K. C. (1946). A comparison of the Thurstone and Likert techniques of

attitudes scale construction. J ournal of Applied Psychology, 30 , 72–83.

Goffman, E. (1959). The presentation of self in everyday life, 1–17. Garden City, NY Goldberg, L. R. (1990). An alternative description of personality: The big-fi ve factor structure.

J ournal of Personality and Social Psychology, 59 (6), 1216.

Herzog, A. R., & Bachman, J. G. (1981). Effects of questionnaire length on response quality.

Public Opinion Quarterly, 45 (4), 549–559.

Holbrook, A. L., & Krosnick, J. A. (2010). Social desirability bias in voter turnout reports tests

using the item count technique. P ublic Opinion Quarterly, 74 (1), 37–67.

Kinder, D. R., & Iyengar, S. (1987). N ews That Matters: Television and American Opinion .

Chicago: University of Chicago Press.

Krosnick, J. A. (1991). Response strategies for coping with the cognitive demands of attitude

measures in surveys. A pplied Cognitive Psychology, 5 , 213–236.

Krosnick, J. A. (1999). Survey research. A nnual review of psychology, 50 (1), 537–567. Krosnick, J. A. (2002). The causes of no-opinion responses to attitude measures in surveys: They are rarely what they appear to be. In R. Groves, D. Dillman, J. Eltinge, & R. Little (Eds.), Survey non-response (pp. 87–100). New York: Wiley.

Krosnick, J. A., & Alwin, D. F. (1987). Satisfi cing: A strategy for dealing with the demands of

survey questions . Columbus, OH: Ohio State University.

Krosnick, J. A., & Alwin, D. F. (1988). A test of the form-resistant correlation hypothesis ratings,

rankings, and the measurement of values. P ublic Opinion Quarterly, 52 (4), 526–538.

Krosnick, J. A., & Fabrigar, L. A. (1997). Designing rating scales for effective measurement in surveys. In L. Lyberg et al. (Eds.), Survey measurement and process quality (pp. 141–164). New York: Wiley.

Krosnick, J. A., Narayan, S., & Smith, W. R. (1996). Satisfi cing in surveys: Initial evidence. N ew

Directions for Evaluation, 1996 (70), 29–44.

Krosnick, J. A., & Presser, S. (2010). Question and questionnaire design. In P. V. Marsden & J. D. Wright (Eds.), H andbook of survey research (pp. 263–314). Bingley, UK: Emerald Group Publishing Limited. Landon, E. L. (1971). Order bias, the ideal rating, and the semantic differential. J ournal of

Marketing Research, 8 (3), 375–378.

O’Muircheartaigh, C. A., Krosnick, J. A., & Helic, A. (2001). Middle alternatives, acquiescence, and the quality of questionnaire data. In B. Irving (Ed.), H arris Graduate School of Public Policy Studies . Chicago, IL: University of Chicago. Paulhus, D. L. (1984). Two-component models of socially desirable responding. J ournal of

Personality and Social Psychology, 46 (3), 598.

Payne, S. L. (1951). The art of asking questions . Princeton, NJ: Princeton University Press. Payne, J. D. (1971). The effects of reversing the order of verbal rating scales in a postal survey.

J ournal of the Marketing Research Society, 14 , 30–44.

Rohrmann, B. (2003). Verbal qualifi ers for rating scales: Sociolinguistic considerations and

psychometric data. Project Report. Australia: University of Melbourne

Saris, W. E., Revilla, M., Krosnick, J. A., & Shaeffer, E. M. (2010). Comparing questions with agree/disagree response options to questions with construct-specifi c response options. Survey Research Methods, 4 (1), 61–79. Schaeffer, N. C., & Presser, S. (2003). The science of asking questions. A nnual Review of Sociology,

29 , 65–88.

Schlenker, B. R., & Weigold, M. F. (1989). Goals and the self-identifi cation process: Constructing desired identities. In L. Pervin (Ed.), Goal concepts in personality and social psychology (pp. 243–290). Hillsdale, NJ: Erlbaum.

Schuman, H., & Presser, S. (1981). Questions and answers in attitude surveys . New York:

Academic Press.

Simon, H. A. (1956). Rational choice and the structure of the environment. P sychological Review,

63 (2), 129–138.

Smith, D. H. (1967). Correcting for social desirability response sets in opinion-attitude survey

research. P ublic Opinion Quarterly, 31 , 87–94.

Stone, G. C., Gage, N. L., & Leavitt, G. S. (1957). Two kinds of accuracy in predicting another’s

responses. The Journal of Social Psychology, 45 (2), 245–254.

Tourangeau, R. (1984). Cognitive science and survey methods. Cognitive aspects of survey methodology: Building a bridge between disciplines (pp. 73–100). Washington, DC: National Academy Press.

Tourangeau, R., Couper, M. P., & Conrad, F. (2004). Spacing, position, and order: Interpre tive heuristics for visual features of survey questions. Public Opinion Quarterly, 68 (3), 368–393.

Tourangeau, R., Rips, L. J., & Rasinski, K. (2000). The psychology of survey response . Cambridge,

UK: Cambridge University Press.

Tourangeau, R., & Smith, T. W. (1996). Asking sensitive questions the impact of data collection

mode, question format, and question context. P ublic Opinion Quarterly, 60 (2), 275–304.

Tourangeau, R., & Yan, T. (2007). Sensitive questions in surveys. P sychological Bulletin,

133 (5), 859.

Villar, A., & Krosnick, J. A. (2011). Global warming vs. climate change, taxes vs. prices: Does

word choice matter? Climatic change, 105 (1), 1–12.

Visual Survey Design

Callegaro, M., Villar, A., & Yang, Y. (2011). A meta-analysis of experiments manipulating progress indicators in Web surveys. A nnual Meeting of the American Association for Public Opinion Research , Phoenix

Couper, M. (2011). Web survey methodology: Interface design, sampling and statistical inference.

Presentation at E USTAT-The Basque Statistics Institute , Vitoria-Gasteiz

Couper, M. P., Conrad, F. G., & Tourangeau, R. (2007). Visual context effects in Web surveys.

Public Opinion Quarterly, 71 (4), 623–634.

Peytchev, A., Couper, M. P., McCabe, S. E., & Crawford, S. D. (2006). Web survey design paging

versus scrolling. P ublic Opinion Quarterly, 70 (4), 596–607.

Yan, T., Conrad, F. G., Tourangeau, R., & Couper, M. P. (2011). Should I stay or should I go: The effects of progress feedback, promised task duration, and length of questionnaire on completing Web surveys. I nternational Journal of Public Opinion Research, 23 (2), 131–147.

E stablished Questionnaire Instruments

Brooke, J. (1996). SUS-A quick and dirty usability scale. Usability Evaluation in Industry,

189 , 194.

Chin, J. P., Diehl, V. A., & Norman, K. L. (1988, May). Development of an instrument measuring user satisfaction of the human-computer interface. In Proceedings of the SIGCHI Conference on Human factors in computing systems (pp. 213–218). New York, NY: ACM

Hart, S. G., & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results

of empirical and theoretical research. H uman Mental Workload, 1 , 139–183.

Kirakowski, J., & Corbett, M. (1993). SUMI: The software usability measurement inventory.

B ritish Journal of Educational Technology, 24 (3), 210–212.

Lewis, J. R. (1995). IBM computer usability satisfaction questionnaires: Psychometric evalua tion and instructions for use. I nternational Journal of Human‐Computer Interaction, 7 (1), 57–78. Moshagen, M., & Thielsch, M. T. (2010). Facets of visual aesthetics. I nternational Journal of

Human-Computer Studies, 68 (10), 689–709.

Questionnaire Evaluation

Bolton, R. N., & Bronkhorst, T. M. (1995). Questionnaire pretesting: Computer assisted coding of concurrent protocols. In N. Schwarz & S. Sudman (Eds.), A nswering questions (pp. 37–64). San Francisco: Jossey-Bass.

Collins, D. (2003). Pretesting survey instruments: An overview of cognitive methods. Quality

of Life Research an International Journal of Quality of Life Aspects of Treatment Care and Rehabilitation, 12 (3), 229–238.

Drennan, J. (2003). Cognitive interviewing: Verbal data in the design and pretesting of question

naires. J ournal of Advanced Nursing, 42 (1), 57–63.

Presser, S., Couper, M. P., Lessler, J. T., Martin, E., Martin, J., Rothgeb, J. M., et al. (2004). Methods

for testing and evaluating survey questions. P ublic Opinion Quarterly, 68 (1), 109–130.

Survey Response Rates and Non-response

American Association for Public Opinion Research, AAPOR. (2011). Standard defi nitions: Final dispositions of case codes and outcome rates for surveys . (7th ed). http://aapor.org/Content/ NavigationMenu/AboutAAPOR/StandardsampEthics/StandardDefinitions/Standard Defi nitions2011.pdf

Baruch, Y. (1999). Response rates in academic studies: A comparative analysis. H uman Relations,

52 , 421–434.

Baruch, Y., & Holtom, B. C. (2008). Survey response rate levels and trends in organizational

research. H uman Relations, 61 (8), 1139–1160.

Church, A. H. (1993). Estimating the effect of incentives on mail survey response rates: A meta

analysis. P ublic Opinion Quarterly, 57 , 62–79.

Cook, C., Heath, F., & Thompson, R. L. (2000). A meta-analysis of response rates in Web- or

Internet-based surveys. E ducational and Psychological Measurement, 60 (6), 821–836.

Dillman, D. A. (1978). M ail and telephone surveys: The total design method . New York: Wiley. Dillman, D. A. (1991). The design and administration of mail surveys. A nnual Review of Sociology,

17 , 225–249.

Dillman, D. A. (2007). M ail and Internet surveys: The tailored design method (2nd ed.). Hoboken,

NJ: Wiley.

Fan, W., & Yan, Z. (2010). Factors affecting response rates of the web survey: A systematic review.

Computers in Human Behavior, 26 (2), 132–139.

Groves, R. M. (2006). Non-response rates and non-response bias in household surveys. P ublic

Opinion Quarterly, 70 , 646–75.

Groves, R. M., Presser, S., & Dipko, S. (2004). The role of topic interest in survey participation

decisions. P ublic Opinion Quarterly, 68 (1), 2–31.

Kaplowitz, M. D., Hadlock, T. D., & Levine, R. (2004). A comparison of web and mail survey

response rates. P ublic Opinion Quarterly, 68 (1), 94–101.

Kerlinger, F. N. (1986). Foundations of behavioral research (3rd ed.). New York: Holt, Rinehart &

Winston.

Kiesler, S., & Sproull, L. S. (1986). Response effects in the electronic survey. Public Opinion

Quarterly, 50 , 402–413.

Lavrakas, P. J. (2011). The use of incentives in survey research. 66th Annual Conference of the

American Association for Public Opinion Research

Lin, I., & Schaeffer, N. C. (1995). Using survey participants to estimate the impact of nonparticipation.

Public Opinion Quarterly, 59 (2), 236–258.

Lu, H., & Gelman, A. (2003). A method for estimating design-based sampling variances for surveys

with weighting, poststratifi cation, and raking. Journal of Offi cial Statistics, 19 (2), 133–152. Manfreda, K. L., Bosnjak, M., Berzelak, J., Haas, I., Vehovar, V., & Berzelak, N. (2008). Web surveys versus other survey modes: A meta-analysis comparing response rates. J ournal of the Market Research Society, 50 (1), 79.

Olson, K. (2006). Survey participation, non-response bias, measurement error bias, and total bias.

Public Opinion Quarterly, 70 (5), 737–758. Peytchev, A. (2009). Survey breakoff. P ublic Opinion Quarterly, 73 (1), 74–97. Schonlau, M., Van Soest, A., Kapteyn, A., & Couper, M. (2009). Selection bias in web surveys and the use of propensity scores. Sociological Methods & Research, 37 (3), 291–318. Sheehan, K. B. (2001). E-mail survey response rates: A review. J ournal of Computer Mediated

Communication, 6 (2), 1–16. Singer, E. (2002). The use of incentives to reduce non-response in household surveys. In R.

Groves, D. Dillman, J. Eltinge, & R. Little (Eds.), Survey non-response (pp. 87–100). New York: Wiley. 163–177.

Stevenson, J., Dykema, J., Cyffka, C., Klein, L., & Goldrick-Rab, S. (2012). What are the odds?

Lotteries versus cash incentives. Response rates, cost and data quality for a Web survey of low- income former and current college students. 67th Annual Conference of the American Association for Public Opinion Research

Survey Analysis

Armstrong, D., Gosling, A., Weinman, J., & Marteau, T. (1997). The place of inter-rater reliability

in qualitative research: An empirical study. Sociology, 31 (3), 597–606.

Böhm, A. (2004). Theoretical coding: Text analysis in grounded theory. In A companion to qualita

tive research , London: SAGE. pp. 270–275.

De Leeuw, E. D., Hox, J. J., & Huisman, M. (2003). Prevention and treatment of item nonresponse.

J ournal of Offi cial Statistics, 19 (2), 153–176.

Glaser, B. G., & Strauss, A. L. (1967). The discovery of grounded theory: Strategies for qualitative

research . Hawthorne, NY: Aldine de Gruyter.

Gwet, K. L. (2001). H andbook of inter-rater reliability . Gaithersburg, MD: Advanced Analytics,

LLC.

Heeringa, S. G., West, B. T., & Berglund, P. A. (2010). A pplied survey data analysis . Boca Raton,

FL: Chapman & Hall/CRC.

Lee, E. S., Forthofer, R. N., & Lorimor, R. J. (1989). A nalyzing complex survey data . Newbury

Park, CA: Sage.

Saldaña, J. (2009). The coding manual for qualitative researchers . Thousand Oaks, CA: Sage

Publications Limited.

Other References

Abran, A., Khelifi , A., Suryn, W., & Seffah, A. (2003). Usability meanings and interpretations in

ISO standards. Software Quality Journal, 11 (4), 325–338.

Anandarajan, M., Zaman, M., Dai, Q., & Arinze, B. (2010). Generation Y adoption of instant messaging: An examination of the impact of social usefulness and media richness on use richness. I EEE Transactions on Professional Communication, 53 (2), 132–143.

Archambault, A., & Grudin, J. (2012). A longitudinal study of facebook, linkedin, & twitter use. In

Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems

(CHI '12) (pp. 2741–2750). New York: ACM

Auter, P. J. (2007). Portable social groups: Willingness to communicate, interpersonal communication

gratifi cations, Communications, 5 (2), 139–156.

and cell phone use among young adults. I nternational Journal of Mobile

Calfee, J. E., & Ringold, D. J. (1994). The 70 % majority: Enduring consumer beliefs about adver

tising. J ournal of Public Policy & Marketing, 13 (2).

Chen, J., Geyer, W., Dugan, C., Muller, M., & Guy, I. (2009). Make new friends, but keep the old:

Recommending people on social networking sites. In Proceedings of the 27th International Conference on Human Factors in Computing Systems (CHI '09) , (pp. 201–210). New York: ACM Clauser, B. E. (2007). The life and labors of Francis Galton: A review of four recent books about the father of behavioral statistics. J ournal of Educational and Behavioral Statistics, 32 (4), 440–444.

Converse, J. (1987). Survey research in the United States: Roots and emergence 1890–1960 .

Berkeley, CA: University of California Press.

Drouin, M., & Landgraff, C. (2012). Texting, sexting, and attachment in college students’ romantic

relationships. Computers in Human Behavior, 28 , 444–449.

Feng, J., Lazar, J., Kumin, L., & Ozok, A. (2010). Computer usage by children with down syndrome: Challenges and future research. A CM Transactions on Accessible Computing, 2 (3), 35–41.

Froelich, J., Findlater, L., Ostergren, M., Ramanathan, S., Peterson, J., Wragg, I., et al. (2012). The design and evaluation of prototype eco-feedback displays for fi xture-level water usage data. In

Proceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems

(CHI '12) (pp. 2367–2376). New York: ACM

Harrison, M. A. (2011). College students’ prevalence and perceptions of text messaging while

driving. A ccident Analysis and Prevention, 43 , 1516–1520.

Junco, R., & Cotten, S. R. (2011). Perceived academic effects of instant messaging use. Computers

& Education, 56 , 370–378.

Katosh, J. P., & Traugott, M. W. (1981). The consequences of validated and self-reported voting

measures. P ublic Opinion Quarterly, 45 (4), 519–535.

Nacke, L. E., Grimshaw, M. N., & Lindley, C. A. (2010). More than a feeling: Measurement of sonic user experience and psychophysiology in a fi rst-person shooter game. I nteracting with Computers, 22 (5), 336–343.

Obermiller, C., & Spangenberg, E. R. (1998). Development of a scale to measure consumer skepti

cism toward advertising. J ournal of Consumer Psychology, 7 (2), 159–186.

Obermiller, C., & Spangenberg, E. R. (2000). On the origin and distinctiveness of skepticism

toward advertising. M arketing Letters, 11 , 311–322.

Person, A. K., Blain, M. L. M., Jiang, H., Rasmussen, P. W., & Stout, J. E. (2011). Text messaging for enhancement of testing and treatment for tuberculosis, human immunodefi ciency virus, and syphilis: A survey of attitudes toward cellular phones and healthcare. Telemedicine Journal and e-Health, 17 (3), 189–195.

Pitkow, J. E., & Recker, M. (1994). Results from the fi rst World-Wide web user survey. Computer

Networks and ISDN Systems, 27 (2), 243–254.

Rodden, R., Hutchinson, H., & Fu, X. (2010). Measuring the user experience on a large scale: User

centered metrics for web applications. In Proceedings of the 28th International Conference on Human Factors in Computing Systems (CHI '10) (pp. 2395–2398) ACM, New York, NY, USA Schild, J., LaViola, J., & Masuch, M. (2012). Understanding user experience in stereoscopic 3D games. In P roceedings of the 2012 ACM Annual Conference on Human Factors in Computing Systems (CHI '12) (pp. 89–98). New York: ACM

Shklovski, I., Kraut, R., & Cummings, J. (2008). Keeping in touch by technology: Maintaining friendships after a residential move. In P roceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems (CHI '08) (pp. 807–816). New York: ACM

Turner, M., Love, S., & Howell, M. (2008). Understanding emotions experienced when using a mobile phone in public: The social usability of mobile (cellular) telephones. Telematics and Informatics, 25 , 201–215.

Weisskirch, R. S., & Delevi, R. (2011). “Sexting” and adult romantic attachment. Computers in

Human Behavior, 27 , 1697–1701.

Wright, P. J., & Randall, A. K. (2012). Internet pornography exposure and risky sexual behavior

among adult males in the United States. Computers in Human Behavior, 28 , 1410–1416.

Yew, J., Shamma, D. A., & Churchill, E. F. (2011). Knowing funny: Genre perception and catego rization in social video sharing. In Proceedings of the 2011 Annual Conference on Human Factors in Computing Systems (CHI '11) (pp. 297–306). New York: ACM

Zaman, M., Rajan, M. A., & Dai, Q. (2010). Experiencing fl ow with instant messaging and its

facilitating role on creative behaviors. Computers in Human Behavior, 26 , 1009–1018.

Human-Computer

Interaction

Scientific Foundations

4

In the last chapter, we examined a variety of interaction topics in HCI. By and large, the research methodology for studying these topics is empirical and scien ti!c. Ideas are conceived, developed, and implemented and then framed as hypoth eses that are tested in experiments. This chapter presents the enabling features of this methodology. Our goal is to establish the what, why, and how of research, with a focus on research that is both empirical and experimental. While much of the dis cussion is general, the examples are directed at HCI. We begin with the terminol ogy surrounding research and empirical research.

4.1 What is research?

Research means different things to different people. “Being a researcher” or “con ducting research” carries a certain elevated status in universities, colleges, and corporations. Consequently, the term research is bantered around in a myriad of sit uations. Often, the word is used simply to add weight to an assertion (“Our research shows that …”). While writing an early draft of this chapter, a television ad for an Internet service provider was airing in southern Ontario. The ad proclaimed, “Independent research proves [name_of_product] is the fastest and most reliable— period.” 1

One might wonder about the nature of the research, or of the independ ence and impartiality of the work. Of course, forwarding assertions to promote facts, observations, hypotheses, and the like is often the goal. But what is research? Surely, it is more than just a word to add force to a statement or opinion. To rise above conjecture, we demand evidence—evidence meeting a standard of credibility such that the statement is beyond dispute. Providing such credibility is the goal of research.

Returning to the word itself, research has at least three de!nitions. First, con ducting research can be an exercise as simple as careful or diligent search. 2 So carefully searching one’s garden to !nd and remove weeds meets one standard of 1 Advertisement by Rogers Communications Inc. airing on television in southern Ontario during the winter of 2008/2009.

2 www.merriam-webster.com.

Human-Computer Interaction.

© 2013 Elsevier Inc. All rights reserved.

121

conducting research. Or perhaps one undertakes a search on a computer to locate all !les modi!ed on a certain date. That’s research. It’s not the stuff of MSc or PhD theses, but it meets one de!nition of research.

The second de!nition of research is collecting information about a particular subject. So surveying voters to collect information on political opinions is con ducting research. In HCI we might observe people interacting with an interface and collect information about their interactions, such as the number of times they consulted the manual, clicked the wrong button, retried an operation, or uttered an expletive. That’s research.

The third de!nition is more elaborate: research is investigation or experimen

tation aimed at the discovery and interpretation of facts and revision of accepted theories or laws in light of new facts.

In this de!nition we !nd several key elements of research that motivate discus sions in this book. We !nd the idea of experimentation. Conducting experiments is a central activity in a lot of HCI research. I will say more about this in the next chapter. In HCI research, an experiment is sometimes called a user study. The methodology is sometimes formal, sometimes ad hoc. A formal and standardized methodology is generally preferred because it brings consistency to a body of work and facilitates the review and comparison of research from different studies. One objective of this book is to promote the use of a consistent methodology for experi mental research in HCI.

To be fair, the title of this book changed a few times on the way to press. Is the book about experimental research? Well, yes, a lot of it is, but there are important forms of HCI research that are non-experimental. So as not to exclude these, the focus shifted to empirical research, a broader term that encompasses both experi mental and non-experimental methodologies. Among the latter is building and test ing models of interaction, which we examine formally in Chapter 7.

Returning to research, the third de!nition speaks of facts. Facts are the building blocks of evidence, and it is evidence we seek in experimental research. For exam ple, we might observe that a user committed three errors while entering a command with an interface. That’s a fact. Of course, context is important. Did the user have prior experience with the interface, or with similar interfaces? Was the user a child or a computer expert? Perhaps we observed and counted the errors committed by a group of users while interacting with two different interfaces over a period of time. If they committed 15 percent more errors with one interface than with the other, the facts are more compelling (but, again, context is important). Collectively, the facts form an outward sign leading to evidence—evidence that one interface is bet ter, or less error prone, than the other. Evidence testing is presented in more detail in Chapter 6, Hypothesis Testing. Note that prove or proof is not used here. In HCI research we don’t prove things; we gather facts and formulate and test evidence.

The third de!nition mentions theories and laws. Theory has two common mean ings. In the sense of Darwin’s theory of evolution or Einstein’s theory of relativity, the term theory is synonymous with hypothesis. In fact, one de!nition of theory is simply “a hypothesis assumed for the sake of argument or investigation.” Of course, through experimentation, these theories advanced beyond argument and investiga tion. The stringent demands of scienti!c inquiry con!rmed the hypotheses of these great scientists. When con!rmed through research, a theory becomes a scienti!cally

accepted body of principles that explain phenomena.

A law is different from a theory. A law is more speci!c, more constraining, more formal, more binding. In the most exacting terms, a law is a relationship or phenomenon that is “invariable under given conditions.” Because variability is ger mane to human behavior, laws are of questionable relevance to HCI. Of course, HCI has laws. Take HCI’s best-known law as an example. Fitts’ law refers to a body of work, originally in human motor behavior (Fitts, 1954), but now widely used in HCI. Fitts’ work pertained to rapid-aimed movements, such as rapidly mov ing a cursor to an object and selecting it in a graphical user interface. Fitts himself never proposed a law. He proposed a model of human motor behavior. And by all accounts, that’s what Fitts’ law is—a model, a behavioral, descriptive, and predic tive model. It includes equations and such for predicting the time to do point-select tasks. It is a law only in that other researchers took up the label as a celebration of the generality and importance of Fitts’ seminal work. We should all be so lucky. Fitts’ law is presented in more detail in Chapter 7.

Research, according to the third de!nition, involves discovery, interpreta tion, and revision. Discovery is obvious enough. That’s what we do—look for, or discover, things that are new and useful. Perhaps the discovery is a new style of interface or a new interaction technique. Interpretation and revision are central to research. Research does not proceed in a vacuum. Today’s research builds on what is already known or assumed. We interpret what is known; we revise and extend through discovery.

There are additional characteristics of research that are not encompassed in the dictionary de!nitions. Let’s examine a few of these.

4.1.1 Research must be published

Publication is the !nal step in research. It is also an essential step. Never has this rung as true as in the edict publish or perish. Researchers, particularly in academia, must publish. A weak or insuf!cient list of publications might spell disappointment when applying for research funds or for a tenure-track professorship at a university. Consequently, developing the skill to publish begins as a graduate student and con tinues throughout one’s career as a researcher, whether in academia or industry. The details and challenges in writing research papers are elaborated in Chapter 8.

Publishing is crucial, and for good reason. Until it is published, the knowledge gained through research cannot achieve its critical purpose—to extend, re!ne, or revise the existing body of knowledge in the !eld. This is so important that publica tion bearing a high standard of scrutiny is required. Not just any publication, but publication in archived peer-reviewed journals or conference proceedings. Research results are “written up,” submitted, and reviewed for their integrity, relevance, and contribution. The review is by peers—other researchers doing similar work. Are the results novel and useful? Does the evidence support the conclusions? Is there a contribution to the !eld? Does the methodology meet the expected standards for research? If these questions are satisfactorily answered, the work has a good chance of acceptance and publication. Congratulations. In the end, the work is published and archived. Archived implies the work is added to the collection of related work accessible to other researchers throughout the world. This is the “existing body of knowledge” referred to earlier. The !nal step is complete.

Research results are sometimes developed into bona !de inventions. If an indi vidual or a company wishes to pro!t from their invention, then patenting is an option. The invention is disclosed in a patent application, which also describes previous related work (prior art), how the invention addresses a need, and the best mode of implementation. If the application is successful, the patent is granted and the inventor or company thereafter owns the rights to the invention. If another com pany wishes to use the invention for commercial purpose, they must enter into a license agreement with the patent holder. This side note is included only to make a small point: a patent is a publication. By patenting, the individual or company is not only retaining ownership of the invention but is also making it public through publication of the patent. Thus, patents meet the must-publish criterion for research.

4.1.2 Citations, references, impact

Imagine the World Wide Web without hyperlinks. Web pages would live in isola tion, without connections between them. Hyperlinks provide the essential path ways that connect web pages to other web pages, thus providing structure and cohesion to a topic or theme. Similarly, it is hard to imagine the world’s body of published research without citations and references. Citations, like hyperlinks, con nect research papers to other research papers. Through citations, a body of research takes shape. The insights and lessons of early research inform and guide later research. The citation itself is just an abbreviated tag that appears in the body of a paper, for example, “… as noted in earlier research (Smith and Jones, 2003)” or “… as con!rmed by Green et al. [5].” These two examples are formatted differently and follow the requirements of the conference or journal. The citation is expanded into a full bibliographic entry in the reference list at the end of the paper. Formatting of citations and references is discussed in Chapter 8.

Citations serve many purposes, including supporting intellectual honesty. By citing previous work, researchers acknowledge that their ideas continue, extend, or re!ne those in earlier research. Citations are also important to back up assertions that are otherwise questionable, for example, “the number of tablet computer users worldwide now exceeds two billion [9].” In the Results section of a research paper, citations are used to compare the current results with those from earlier research, for example, “the mean time to formulate a search query was about 15 percent less than the time reported by Smith and Jones [5].”

Figure 4.1 provides a schematic of a collection of research papers. Citations are shown as arrows. It incorporates a timeline, so all arrows point to the left, to earlier

FIGURE 4.1

A collection of research papers with citations to earlier papers.

papers. One of the papers seems to have quite a few citations to it. The number of citations to a research paper is a measure of the paper’s impact. If many researchers cite a single paper, there is a good chance the work described in the cited paper is both of high quality and signi!cant to the !eld. This point is often echoed in aca demic circles: “The only objective and transparent metric that is highly correlated with the quality of a paper is the number of citations.” 3 Interestingly enough, cita tion counts are only recently easily available. Before services like Google Scholar emerged, citation counts were dif!cult to obtain.

Since citation counts are available for individual papers, they are also easy to compile for individual researchers. Thus, impact can be assessed for researchers as well as for papers. The most accepted single measure of the impact of a research er’s publication record is the H-index. If a researcher’s publications are ordered by the number of citations to each paper, the H-index is the point where the rank equals the number of citations. In other words, a researcher with H-index = n has n publications each with n or more citations. Physicist J. Hirsch !rst proposed the H-index in 2005 (Hirsch, 2005). H-index quanti!es in a single number both research productivity (number of publications) and overall impact of a body of work (number of citations). Some of the strengths and weaknesses of the H-index, as a measure of impact, are elaborated elsewhere (MacKenzie, 2009a).

3 Dianne Murray, General Editor, Interacting with Computers. Posted to chi-announcements@acm. org on Oct 8, 2008.

4.1.3 Research must be reproducible

Research that cannot be replicated is useless. Achieving an expected standard of reproducibility, or repeatability, is therefore crucial. This is one reason for advanc ing a standardized methodology: it enforces a process for conducting and writing about the research that ensures suf!cient detail is included to allow the results to be replicated. If skilled researchers care to test the claims, they will !nd suf!cient guidance in the methodology to reproduce, or replicate, the original research. This is an essential characteristic of research.

Many great advances in science and research pertain to methodology. A sig ni!cant contribution by Louis Pasteur (1822–1895), for example, was his use of a consistent methodology for his research in microbiology (Day and Gastel, 2006, pp. 8–9). Pasteur’s experimental !ndings on germs and diseases were, at the time, controversial. As Pasteur realized, the best way to fend off skepticism was to empower critics—other scientists—to see for themselves. Thus, he adopted a methodology that included a standardized and meticulous description of the materi als and procedure. This allowed his experiments and !ndings to be replicated. A researcher questioning a result could redo the experiment and therefore verify or refute the result. This was a crucial advance in science. Today, reviewers of manu scripts submitted for publication are often asked to critique the work on this very point: “Is the work replicable?” “No” spells certain rejection.

One of the most cited papers in publishing history is a method paper. Lowry et al.’s, 1951 paper “Protein Measurement With the Folin Phenol Reagent” has garnered in excess of 200,000 citations (Lowry, Rosenbrough, Farr, and Randall, 1951).4 The paper describes a method for measuring proteins in #uids. In style, the paper reads much like a recipe. The method is easy to read, easy to follow, and, importantly, easy to reproduce.

4.1.4 Research versus engineering versus design

There are many ways to distinguish research from engineering and design. Researchers often work closely with engineers and designers, but the skills and contributions each brings are different. Engineers and designers are in the business of building things. They create products that strive to bring together the best in form (design emphasis) and function (engineering emphasis). One can imagine that there is certain tension, even a trade-off, between form and function. Finding the right balance is key. However, sometimes the balance tips one way or the other. When this occurs, the result is a product or a feature that achieves one (form or function) at the expense of the other. An example is shown in Figure 4.2a. The image shows part of a notebook computer, manufactured by a well-known computer company. By most accounts, it is a typical notebook computer. The image shows part of the keyboard and the built-in pointing device, a touchpad. The touchpad design (or is it engineering?) is interesting. It is seamlessly embedded in the system chassis.

4 See http://scholar.google.com.

FIGURE 4.2

Form trumping function: (a) Notebook computer. (b) Duct tape provides tactile feedback indicating the edge of the touchpad.

The look is elegant—smooth, shiny, metallic. But something is wrong. Because the mounting is seamless and smooth, tactile feedback at the sides of the touch pad is missing. While positioning a cursor, the user has no sense of when his or her !nger reaches the edge of the touchpad, except by observing that the cursor ceases to move. This is an example of form trumping function. One user’s solution is shown in Figure 4.2b. Duct tape added on each side of the touchpad provides the all-important tactile feedback. 5

Engineers and designers work in the world of products. The focus is on design ing complete systems or products. Research is different. Research tends to be nar rowly focused. Small ideas are conceived of, prototyped, tested, then advanced or discarded. New ideas build on previous ideas and, sooner or later, good ideas are re!ned into the building blocks—the materials and processes—that !nd their way into products. But research questions are generally small in scope. Research tends to be incremental, not monumental.

5 For an amusing example of function trumping form, visit Google Images using “Rube Goldberg simple alarm clock.”

FIGURE 4.3

Timeline for research, engineering, and design.

Engineers and designers also work with prototypes, but the prototype is used to assess alternatives at a relatively late stage: as part of product development. A researcher’s prototype is an early mock-up of an idea, and is unlikely to directly appear in a product. Yet the idea of using prototypes to inform or assess is remark ably similar, whether for research or for product development. The following char acterization by Tim Brown (CEO of design !rm IDEO) is directed at designers, but is well aligned with the use of prototypes for research:

Prototypes should command only as much time, effort, and investment as are needed to generate useful feedback and evolve an idea. The more “!nished” a prototype seems, the less likely its creators will be to pay attention to and pro!t from feedback. The goal of prototyping isn’t to !nish. It is to learn about the strengths and weaknesses of the idea and to identify new directions that further prototypes might take (Brown, 2008, p. 3).

One facet of research that differentiates it from engineering and design is the timeline. Research precedes engineering and design. Furthermore, the march for ward for research is at a slower pace, without the shackles of deadlines. Figure 4.3 shows the timeline for research, engineering, and design. Products are the stuff of deadlines. Designers and engineers work within the corporate world, develop ing products that sell, and hopefully sell well. The raw materials for engineers and designers are materials and processes that already exist (dashed line in Figure 4.3) or emerge through research.

The computer mouse is a good example. It is a hugely successful product that, in many ways, de!nes a generation of computing, post 1981, when the Xerox Star was introduced. But in the 1960s the mouse was just an idea. As a prototype it worked well as an input controller to maneuver a tracking symbol on a graphics display. Engelbart’s invention (English et al., 1967) took nearly 20 years to be engi neered and designed into a successful product.

Similar stories are heard today. Apple Computer Inc., long known as a leader in innovation, is always building a better mousetrap. An example is the iPhone, introduced in June, 2007. And, evidently, the world has beaten a path to Apple’s door.6 Notably, “with the iPhone, Apple successfully brought together decades 6 The entire quotation is “Build a better mousetrap and the world will beat a path to your door” and is attributed to American essayist Ralph Waldo Emerson (1803–1882).

4.2 What is empirical research?

of research” (Selker, 2008). Many of the raw materials of this successful product came by way of low-level research, undertaken well before Apple’s engineers and designers set forth on their successfully journey. Among the iPhone’s interaction novelties is a two-!nger pinch gesture for zooming in and out. New? Perhaps, but Apple’s engineers and designers no doubt were guided or inspired by research that came before them. For example, multi-touch gestures date back to at least the 1980s (Buxton, Hill, and Rowley, 1985; Hauptmann, 1989). What about changing the aspect ratio of the display when the device is tilted? New? Perhaps not. Tilt, as an interaction technique for user interfaces, dates back to the 1990s (B. Harrison et al., 1998; Hinckley et al., 2000; Rekimoto, 1996). These are just two examples of research ideas that, taken alone, are small scale. While engineers and designers strive to build better systems or products, in the broadest sense, researchers provide the raw materials and processes engineers and designers work with: stronger steel for bridges, a better mouse for pointing, a better algorithm for a search engine, a more natural touch interface for mobile phones.

4.2 What is empirical research?

By pre!xing research with empirical, some powerful new ideas are added. According to one de!nition, empirical means originating in or based on observa tion or experience. Simple enough. Another de!nition holds that empirical means relying on experience or observation alone, often without due regard for system and theory. This is interesting. These words suggest researchers should be guided by direct observations and experiences about phenomena, without prejudice to, or even consideration of, existing theories. This powerful idea is a guiding principle in science—not to be blinded by preconceptions. Here’s an example. Prior to the 15th century, there was a prevailing system or theory that celestial bodies revolved around the earth. The Polish scientist Nicolas Copernicus (1473–1543) found evi dence to the contrary. His work was empirical. It was based on observation without bias toward, in#uence by, or due regard to, existing theory. He observed, he col lected data, he looked for patterns and relationships in the data, and he found evi dence within the data that cut across contemporary thinking. His empirical evidence led to one of the great achievements in modern science—a heliocentric cosmology that placed the sun, rather than the earth, at the center of the solar system. Now that’s a nice discovery (see the third de!nition of research at the beginning of this chapter). In HCI and other !elds of research, discoveries are usually more modest.

By another de!nition, empirical means capable of being veri!ed or disproved by observation or experiment. These are strong words. An HCI research initiative is framed by hypotheses—assertions about the merits of an interface or an interaction technique. The assertions must be suf!ciently clear and narrow to enable veri!ca tion or disproval by gathering and testing evidence. This means using language in an assertion that speaks directly to empirical, observable, quanti!able aspects of the interaction. I will expand on this later in this chapter in the discussion on research questions.

4.3 Research methods

There are three common approaches, or methods, for conducting research in HCI and other disciplines in the natural and social sciences: the observational method, the experimental method, and the correlational method. All three are empirical as they are based on observation or experience. But there are differences and these follow from the objectives of the research and from the expertise and style of the researcher. Let’s examine each method.

4.3.1 Observational method

Observation is the starting point for this method. In conducting empirical research in HCI, it is essential to observe humans interacting with computers or computer embedded technology of some sort. The observational method encompasses a col lection of common techniques used in HCI research. These include interviews, !eld investigations, contextual inquiries, case studies, !eld studies, focus groups, think aloud protocols, storytelling, walkthroughs, cultural probes, and so on. The approach tends to be qualitative rather than quantitative. As a result, observational methods achieve relevance while sacri!cing precision (Sheskin, 2011, p. 76). Behaviors are studied by directly observing phenomena in a natural setting, as opposed to crafting constrained behaviors in an arti!cial laboratory setting. Real world phenomena are high in relevance, but lack the precision available in con trolled laboratory experiments.

Observational methods are generally concerned with discovering and explain ing the reasons underlying human behavior. In HCI, this is the why or how of the interaction, as opposed to the what, where, or when. The methods focus on human thought, feeling, attitude, emotion, passion, sensation, re#ection, expression, senti ment, opinion, mood, outlook, manner, style, approach, strategy, and so on. These human qualities can be studied through observational methods, but they are dif!cult to measure. The observations are more likely to involve note-taking, photographs, videos, or audio recordings rather than measurement. Measurements, if gathered, tend to use categorical data or simple counts of phenomena. Put another way, obser vational methods tend to examine and record the quality of interaction rather than quanti!able human performance.

4.3.2 Experimental method

With the experimental method (also called the scienti!c method), knowledge is acquired through controlled experiments conducted in laboratory settings. Acquiring knowledge may imply gathering new knowledge, but it may also mean studying existing knowledge for the purpose of verifying, refuting, correcting, integrating, or extending. In the relevance-precision dichotomy, it is clear where controlled experiments lie. Since the tasks are arti!cial and occur in a controlled laboratory setting, relevance is diminished. However, the control inherent in the

4.3 Research methods

methodology brings precision, since extraneous factors—the diversity and chaos of the real world—are reduced or eliminated.

A controlled experiment requires at least two variables: a manipulated variable and a response variable. In HCI, the manipulated variable is typically a property of an interface or interaction technique that is presented to participants in different con !gurations. Manipulating the variable simply refers to systematically exposing partic ipants to different con!gurations of the interface or interaction technique. To qualify as a controlled experiment, at least two con!gurations are required. Thus, compari son is germane to the experimental method. This point deserves further elaboration. In HCI, we often hear of a system or design undergoing a “usability evaluation” or “user testing.” Although these terms often have different meanings in different contexts, such evaluations or tests generally do not follow the experimental method. The rea son is simple: there is no manipulated variable. This is mentioned only to distinguish a usability evaluation from a user study. Undertaking a user study typically implies conducting a controlled experiment where different con!gurations of a system are tested and compared. A “usability evaluation,” on the other hand, usually involves assessing a single user interface for strengths and weaknesses. The evaluation might qualify as research (“collecting information about a particular subject”), but it is not experimental research. I will return to this point shortly. A manipulated variable is also called an independent variable or factor.

A response variable is a property of human behavior that is observable, quanti! able, and measurable. The most common response variable is time, often called task completion time or some variation thereof. Given a task, how long do participants take to do the task under each of the con!gurations tested? There are, of course, a multitude of other behaviors that qualify as response variables. Which ones are used depend on the characteristics of the interface or interaction technique studied in the research. A response variable is also called a dependent variable. Independent vari ables and dependent variables are explored in greater detail in Chapter 5.

HCI experiments involve humans, so the methodology employed is borrowed from experimental psychology, a !eld with a long history of research involving humans. In a sense, HCI is the bene!ciary of this more mature !eld. The circum stances manipulated in a psychology experiment are often quite different from those manipulated in an HCI experiment, however. HCI is narrowly focused on the interaction between humans and computing technology, while experimental psy chology covers a much broader range of the human experience.

It is naïve to think we can simply choose to focus on the experimental method and ignore qualities of interaction that are outside the scope of the experimental procedure. A full and proper user study—an experiment with human participants— involves more than just measuring and analyzing human performance. We engage observational methods by soliciting comments, thoughts, and opinions from partici pants. Even though a task may be performed quickly and with little or no error, if participants experience fatigue, frustration, discomfort, or another quality of inter action, we want to know about it. These qualities of interaction may not appear in the numbers, but they cannot be ignored.

One !nal point about the experimental method deserves mention. A controlled experiment, if designed and conducted properly, often allows a powerful form of conclusion to be drawn from the data and analyses. The relationship between the independent variable and the dependent variable is one of cause and effect; that is, the manipulations in the interface or interaction techniques are said to have caused the observed differences in the response variable. This point is elaborated in greater detail shortly. Cause-and-effect conclusions are not possible in research using the observational method or the correlational method.

4.3.3 Correlational method

The correlational method involves looking for relationships between variables. For example, a researcher might be interested in knowing if users’ privacy set tings in a social networking application are related to their personality, IQ, level of education, employment status, age, gender, income, and so on. Data are col lected on each item (privacy settings, personality, etc.) and then relationships are examined. For example, it might be apparent in the data that users with certain personality traits tend to use more stringent privacy settings than users with other personality traits.

The correlational method is characterized by quanti!cation since the magnitude of variables must be ascertained (e.g., age, income, number of privacy settings). For nominal-scale variables, categories are established (e.g., personality type, gen der). The data may be collected through a variety of methods, such as observation, interviews, on-line surveys, questionnaires, or measurement. Correlational meth ods often accompany experimental methods, if questionnaires are included in the experimental procedure. Do the measurements on response variables suggest rela tionships by gender, by age, by level of experience, and so on?

Correlational methods provide a balance between relevance and precision. Since the data were not collected in a controlled setting, precision is sacri!ced. However, data collected using informal techniques, such as interviews, bring relevance—a connection to real-life experiences. Finally, the data obtained using correlational methods are circumstantial, not causal. I will return to this point shortly.

This book is primarily directed at the experimental method for HCI research. However, it is clear in the discussions above that the experimental method will often include observational methods and correlational methods.

4.4 Observe and measure

Let’s return to the foundation of empirical research: observation.

4.4.1 Observation

The starting point for empirical research in HCI is to observe humans interacting with computers. But how are observations made? There are two possibilities. Either another human is the observer or an apparatus is the observer. A human observer is the experimenter or investigator, not the human interacting with the computer. Observation is the precursor to measurement, and if the investigator is the observer, then measurements are collected manually. This could involve using a log sheet or notebook to jot down the number of events of interest observed. Events of interest might include the number of times the user clicked a button or moved his or her hand from the keyboard to the mouse. It might involve observing users in a public space and counting those who are using mobile phones in a certain way, for example, while walking, while driving, or while paying for groceries at a checkout counter. The observations may be broken down by gender or some other attribute of interest.

Manual observation could also involve timing by hand the duration of activities, such as the time to type a phrase of text or the time to enter a search query. One can imagine the dif!culty in manually gathering measurements as just described, not to mention the inaccuracy in the measurements. Nevertheless, manual timing is useful for preliminary testing, sometimes called pilot testing.

More often in empirical research, the task of observing is delegated to the appa ratus—the computer. Of course, this is a challenge in some situations. As an exam ple, if the interaction is with a digital sports watch or automated teller machine (ATM), it is not possible to embed data collection software in the apparatus. Even if the apparatus is a conventional desktop computer, some behaviors of interest are dif!cult to detect. For example, consider measuring the number of times the user’s attention switches from the display to the keyboard while doing a task. The com puter is not capable of detecting this behavior. In this case, perhaps an eye tracking apparatus or camera could be used, but that adds complexity to the experimental apparatus. Another example is clutching with a mouse—lifting and repositioning the device. The data transmitted from a mouse to a host computer do not include information on clutching, so a conventional host system is not capable of observ ing and recording this behavior. Again, some additional apparatus or sensing tech nology may be devised, but this complicates the apparatus. Or a human observer can be used. So depending on the behaviors of interest, some ingenuity might be required to build an apparatus and collect the appropriate measurements.

If the apparatus includes custom software implementing an interface or inter action technique, then it is usually straightforward to record events such as key presses, mouse movement, selections, !nger touches, or !nger swipes and the asso ciated timestamps. These data are stored in a !le for follow-up analyses.

4.4.2 Measurement scales

Observation alone is of limited value. Consider observations about rain and #ow ers. In some locales, there is ample rain but very few #owers in April. This is fol lowed by less rain and a full-blown !eld of #owers in May. The observations may inspire anecdote (April showers bring May "owers), but a serious examination of patterns for rain and #owers requires measurement. In this case, an observer located in a garden would observe, measure, and record the amount of rain and

FIGURE 4.4

Scales of measurement: nominal, ordinal, interval, and ratio. Nominal measurements are considered simple, while ratio measurements are sophisticated.

the number of #owers in bloom. The measurements might be recorded each day during April and May, perhaps by several observers in several gardens. The meas urements are collected, together with the means, tallied by month and analyzed for “signi!cant differences” (see Chapter 6). With measurement, anecdotes turn to empirical evidence. The observer is now in a position to quantify the amount of rain and the number of #owers in bloom, separately for April and May. The added value of measurement is essential for science. In the words of engineer and physicist Lord Kelvin (1824–1907), after whom the Kelvin scale of tempera ture is named, “[Without measurement] your knowledge of it is of a meager and unsatisfactory kind.” 7

As elaborated in many textbooks on statistics, there are four scales of meas urement: nominal, ordinal, interval, and ratio. Organizing this discussion by these four scales will help. Figure 4.4 shows the scales along a continuum with nominal scale measurements as the least sophisticated and ratio-scale measurements as the most sophisticated. This follows from the types of computations possible with each measurement, as elaborated below.

The nature, limitations, and abilities of each scale determine the sort of informa tion and analyses possible in a research setting. Each is brie#y de!ned below.

4.4.3 Nominal

A measurement on the nominal scale involves arbitrarily assigning a code to an attribute or a category. The measurement is so arbitrary that the code needn’t be a number (although it could be). Examples are automobile license plate numbers, codes for postal zones, job classi!cations, military ranks, etc. Clearly, mathemati cal manipulations on nominal data are meaningless. It is nonsense, for example, to

7 The exact and full quote, according to several online sources, is “When you can measure what you are speaking about, and express it in numbers, you know something about it; but when you can not measure it, when you cannot express it in numbers, your knowledge of it is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely, in your thoughts, advanced it to the stage of science.” compute the mean of several license plate numbers. Nominal data identify mutu ally exclusive categories. Membership or exclusivity is meaningful, but little else. The only relationship that holds is equivalence, which exists between entities in the same class. Nominal data are also called categorical data.

If we are interested in knowing whether males and females differ in their use of mobile phones, we might begin our investigation by observing people and assign ing each a code of “M” for male, “F” for female. Here, the attribute is gender and the code is M or F. If we are interested in handedness, we might observe the writing habits of users and assign codes of “LH” for left-handers and “RH” for right-hand ers. If we are interested in scrolling strategies, we might observe users interacting with a GUI application and categorize them according to their scrolling methods, for example as “MW” for mouse wheel, “CD” for clicking and dragging the scroll bar, or “KB” for keyboard.

Nominal data are often used with frequencies or counts—the number of occur rences of each attribute. In this case, our research is likely concerned with the dif ference in the counts between categories: “Are males or females more likely to …?”, “Do left handers or right handers have more dif!culty with …?”, or “Are Mac or PC users more inclined to …?” Bear in mind that while the attribute is categori cal, the count is a ratio-scale measurement (discussed shortly).

Here is an example of nominal scale attributes using real data. Attendees of an HCI research course were dispatched to several locations on a university campus. Their task was to observe, categorize, and count students walking between classes. Each student was categorized by gender (male, female) and by whether he or she was using a mobile phone (not using, using). The results are shown in Figure 4.5. A total of 1,527 students were observed. The split by gender was roughly equal (51.1% male, 48.9% female). By mobile phone usage, 13.1 percent of the students (200) were observed using their mobile phone while walking.

The research question in Figure 4.5 is a follows: are males or females more likely to use a mobile phone as they walk about a university campus? I will demon strate how to answer this question in Chapter 6 on Hypothesis Testing.

Gender

Mobile Phone Usage

Total

%

Not Using

Using

Male

683

98

781

51.1%

Female

644

102

746

48.9%

Total

1327

200

1527

%

86.9%

13.1%

FIGURE 4.5

Two examples of nominal scale data: gender (male, female) and mobile phone usage (not using, using).

FIGURE 4.6

Example of a questionnaire item soliciting an ordinal response.

4.4.4 Ordinal data

Ordinal scale measurements provide an order or ranking to an attribute. The attribute can be any characteristic or circumstance of interest. For example, users might be asked to try three global positioning systems (GPS) for a period of time and then rank the systems by preference: !rst choice, second choice, third choice. Or users could be asked to consider properties of a mobile phone such as price, features, cool appeal, and usability, and then order the features by personal importance. One user might choose usability (!rst), cool-appeal (second), price (third), and then features (fourth). The main limitation of ordinal data is that the interval is not intrinsically equal between successive points on the scale. In the example just cited, there is no innate sense of how much more important usability is over cool-appeal or whether the difference is greater or less than that between, for example, cool-appeal and price.

If we are interested in studying users’ e-mail habits, we might use a question naire to collect data. Figure 4.6 gives an example of a questionnaire item soliciting ordinal data. There are !ve rankings according to the number of e-mail messages received per day. It is a matter of choice whether to solicit data in this manner or, in the alternative, to ask for an estimate of the number of e-mail messages received per day. It will depend on how the data are used and analyzed.

Ordinal data are slightly more sophisticated than nominal data since compari sons of greater than or less than are possible. However, it is not valid to compute the mean of ordinal data.

4.4.5 Interval data

Moving up in sophistication, interval data have equal distances between adjacent values. However, there is no absolute zero. The classic example of interval data is temperature measured on the Fahrenheit or Celsius scale. Unlike ordinal data, it is meaningful to compute the mean of interval data, for example, the mean mid-day temperature during the month of July. Ratios of interval data are not meaningful, however. For example, one cannot say that 20°C is twice as warm as 10°C.

In HCI, interval data are commonly used in questionnaires where a response on a linear scale is solicited. An example is a Likert Scale (see Figure 4.7), where verbal responses are given a numeric code. In the example, verbal responses are

Please indicate your level of agreement with the following statements.

Strongly Mildly disagree disagree

Mildly Strongly Neutral agree

agree

--------------------------------------------------------------------------------------------------------

It is safe to talk on a mobile phone while driving.

It is safe to read a text message on a

mobile phone while

driving.

1

1

It is safe to compose 1 a text message on a

mobile phone while

driving.

2

2

2

3

3

3

4

4

4

5

5

5

FIGURE 4.7

A set of questionnaire items organized in a Likert Scale. The responses are examples of interval scale data.

symmetric about a neutral, central value with the gradations between responses more or less equal. It is this last quality—equal gradations between responses—that validates calculating the mean of the responses across multiple respondents.

There is some disagreement among researchers on the assumption of equal gra dations between the items in Figure 4.7. Do respondents perceive the difference between, say, 1 and 2 (strongly disagree and mildly disagree) the same as the dif ference between, say, 2 and 3 (mildly disagree and neutral)? Attaching verbal tags to numbers is likely to bring qualitative and highly personal interpretations to the responses. There is evidence that respondents perceive items at the extremes of the scale as farther apart than items in the center (Kaptein, Nass, and Markopoulos, 2010). Nevertheless, the graduation between responses is much more similar here than between the !ve ordinal responses in Figure 4.6. One remedy for non-equal gradations in Likert-scale response items is simply to instruct respondents to inter pret the items as equally spaced.

Examples of Likert Scale questionnaire items in HCI research papers are as follows: Bickmore and Picard, 2004; Dautenhahn et al., 2006; Garau et al., 2003; Guy, Ur, Ronen, Perer, and Jacovi, 2011; Wobbrock, Chau, and Myers, 2007.

4.4.6 Ratio data

Ratio-scale measurements are the most sophisticated of the four scales of meas urement. Ratio data have an absolute zero and support a myriad of calculations to summarize, compare, and test the data. Ratio data can be added, subtracted, multi plied, divided; means, standard deviations, and variances can be computed. In HCI, the most common ratio-scale measurement is time—the time to complete a task. But generally, all physical measurements are also ratio-scale, such as the distance or velocity of a cursor as it moves across a display, the force applied by a !nger on a touchscreen, and so on. Many social variables are also ratio-scale, such as a user’s age or years of computer experience.

Another common ratio-scale measurement is count (noted above). Often in HCI research, we count the number of occurrences of certain human activities, such as the number of button clicks, the number of corrective button clicks, the number of characters entered, the number of incorrect characters entered, the number of times an option is selected, the number of gaze shifts, the number of hand movements between the mouse and keyboard, the number of task retries, the number of words in a search query, etc. Although we tend to give time special attention, it too is a count—the number of seconds or minutes elapsed as an activity takes place. These are all ratio-scale measurements.

The expressive nature of a count is improved through normalization; that is, expressing the value as a count per something. So for example, knowing that a 10-word phrase was entered in 30 seconds is less revealing than knowing that the rate of entry was 10 / 0.5 = 20 words per minute (wpm). The main bene!t of nor malizing counts is to improve comparisons. It is easy to compare 20 wpm for one method with 23 wpm for another method—the latter method is faster. It is much harder to compare 10 words entered in 30 seconds for one method with 14 words entered in 47 seconds for another method.

As another example, let’s say two errors were committed while entering a 50-character phrase of text. Reporting the occurrence of two errors reveals very little, unless we also know the length of the phrase. Even so, comparisons with results from another study are dif!cult. (What if the other study used phrases of dif ferent lengths?) However, if the result is reported as a 2 / 50 = 4% error rate, there is an immediate sense of the meaning, magnitude, and relevance of the human per formance measured, and as convention has it, the other study likely reported error rates in much the same way. So where possible, normalize counts to make the measurements more meaningful and to facilitate comparisons.

An example in the literature is an experiment comparing !ve different text entry methods (Magerkurth and Stenzel, 2003). For speed, results were reported in “words per minute” (that’s !ne); however, for accuracy, results were reported as the number of errors committed. Novice participants, for example, committed 24 errors while using multi-tap (Magerkurth and Stenzel, 2003, Table 2). While this number is useful for comparing results within the experiment, it provides no insight as to how the results compare with those in related research. The results would be more enlightening if normalized for the amount of text entered and reported as an “error rate (%),” computed as the number of character errors divided by the total number of characters entered times 100.

4.5 Research questions

4.5 Research questions

In HCI, we conduct experimental research to answer (and raise!) questions about a new or existing user interface or interaction technique. Often the questions pertain to the relationship between two variables, where one variable is a circumstance or condition that is manipulated (an interface property) and the other is an observed and measured behavioral response (task performance).

The notion of posing or answering questions seems simple enough, but this is tricky because of the human element. Unlike an algorithm operating on a data set, where the time to search, sort, or whatever is the same with each try, people exhibit variability in their actions. This is true both from person to person and for a single person repeating a task. The result is always different! This variability affects the con!dence with which we can answer research questions. To gauge the con!dence of our answers, we use statistical techniques, as presented in Chapter 6, Hypothesis Testing.

Research questions emerge from an inquisitive process. The researcher has an idea and wishes to see if it has merit. Initial thoughts are #uid and informal:

● Is it viable? ● Is it as good as or better than current practice? ● What are its strengths and weaknesses? ● Which of several alternatives is best? ● What are the human performance limits and capabilities? ● Does it work well for novices, for experts? ● How much practice is required to become pro!cient?

These questions are unquestionably relevant, since they capture a researcher’s thinking at the early stages of a research project. However, the questions above suf fer a serious de!ciency: They are not testable. The goal, then, is to move forward from the loose and informal questions above to questions more suitable for empiri cal and experimental enquiry.

I’ll use an example to show how this is done. Perhaps a researcher is interested in text entry on touchscreen phones. Texting is something people do a lot. The researcher is experienced with the Qwerty soft keyboard on touchscreen phones, but !nds it error prone and slow. Having thought about the problem for a while, an idea emerges for a new technique for entering text. Perhaps it’s a good idea. Perhaps it’s really good, better than the basic Qwerty soft keyboard (QSK). Being motivated to do research in HCI, the researcher builds a prototype of the entry technique and !ddles with the implementation until it works !ne. The researcher decides to undertake some experimental research to evaluate the idea. What are the research questions? Perhaps the following capture the researcher’s thinking:

● Is the new technique any good? ● Is the new technique better than QSK?

● Is the new technique faster than QSK? ● Is the new technique faster than QSK after a bit of practice? ● Is the measured entry speed (in words per minute) higher for the new technique

than for a QSK after one hour of use? From top to bottom, the questions are progressively narrower and more focused. Expressions like “any good” or “better than,” although well intentioned, are prob lematic for research. Remember observation and measurement? How does one measure “better than”? Farther down the list, the questions address qualities that are more easily observed and measured. Furthermore, since they are expressed across alternative designs, comparisons are possible. The last question speaks very speci! cally to entry speed measured in words per minute, to a comparison between two methods, and to a criterion for practice. This is a testable research question.

4.6 Internal validity and external validity

At this juncture we are in a position to consider two important properties of experi mental research: internal validity and external validity. I’ll use the research questions above to frame the discussion. Two of the questions appear in the plot in Figure 4.8. The x-axis is labeled Breadth of Question or, alternatively, External Validity. The y-axis is labeled Accuracy of Answer or, alternatively, Internal Validity.

The question

is positioned as high in breadth (that’s good!) yet answerable with low accuracy (that’s bad!). As already noted, this question is not testable in an empirical sense. Attempts to answer it directly are fraught with problems, because we lack a meth odology to observe and measure “better than” (even though !nding better interfaces is the !nal goal).

Accuracy of Answer

FIGURE 4.8

Internal Validity

High Is the measured entry speed (in words per minute) higher with the new technique than with QSK after one hour of use?

Low

Low

High

Graphical comparison of Internal Validity and External Validity.

4.6 Internal validity and external validity

The other, more detailed question

Is the measured entry speed (in words per minute) higher with the new technique than with QSK after one hour of use?

is positioned as low in breadth (that’s bad!) yet answerable with high accuracy (that’s good!). The question is testable, which means we can craft a methodology to answer it through observation and measurement. Unfortunately, the narrow scope of the question brings different problems. Focusing on entry speed is !ne, but what about other aspects of the interaction? What about accuracy, effort, comfort, cogni tive load, user satisfaction, practical use of the technique, and so on? The question excludes consideration of these, hence the low breadth rating.

The alternative labels for the axes in Figure 4.8 are internal validity and external validity. In fact, the !gure was designed to set up discussion on these important terms in experimental research.

Internal validity (de!nition) is the extent to which an effect observed is due to the test conditions. For the example, an effect is simply the difference in entry speed between the new technique and QSK. If we conduct an experiment to meas ure and compare the entry speed for the two techniques, we want con!dence that the difference observed was actually due to inherent differences between the tech niques. Internal validity captures this con!dence. Perhaps the difference was due to something else, such as variability in the responses of the participants in the study. Humans differ. Some people are predisposed to be meticulous, while others are carefree, even reckless. Furthermore, human behavior—individually or between people—can change from one moment to the next, for no obvious reason. Were some participants tested early in the day, others late in the day? Were there any dis tractions, interruptions, or other environmental changes during testing? Suf!ce it to say that any source of variation beyond that due to the inherent properties of the test conditions tends to compromise internal validity. High internal validity means the effect observed really exists.

External validity (de!nition) is the extent to which experimental results are generalizable to other people and other situations. Generalizable clearly speaks to breadth in Figure 4.8. To the extent the research pursues broadly framed questions, the results tend to be broadly applicable. But there is more. Research results that apply to “other people” imply that the participants involved were representative of a larger intended population. If the experiment used 18- to 25-year-old computer lit erate college students, the results might generalize to middle-aged computer literate professionals. But they might not generalize to middle-aged people without com puter experience. And they likely would not apply to the elderly, to children, or to users with certain disabilities. In experimental research, random sampling is impor tant for generalizability; that is, the participants selected for testing were drawn at random from the desired population.

Generalizable to “other situations” means the experimental environment and procedures were representative of real world situations where the interface or

FIGURE 4.9

There is tension between internal validity and external validity. Improving one comes at the expense of the other.

(Sketch courtesy of Bartosz Bajer)

technique will be used. If the research studied the usability of a GPS system for taxi drivers or delivery personnel and the experiment was conducted in a quiet, secluded research lab, there may be a problem with external validity. Perhaps a different experimental environment should be considered. Research on text entry where par ticipants enter predetermined text phrases with no punctuation symbols, no upper case characters, and without any ability to correct mistakes, may have problem with external validity. Again, a different experimental procedure should be considered.

The scenarios above are overly dogmatic. Experiment design is an exercise in compromise. While speaking in the strictest terms about high internal validity and high external validity, in practice one is achieved at the expense of the other, as characterized in Figure 4.9.

To appreciate the tension between internal and external validity, two addi tional examples are presented. The !rst pertains to the experimental environment. Consider an experiment that compares two remote pointing devices for presenta tion systems. To improve external validity, the experimental environment mimics expected usage. Participants are tested in a large room with a large presentation size display, they stand, and they are positioned a few meters from the display. The other participants are engaged to act as an audience by attending and sitting around tables in the room during testing. There is no doubt this environment improves external validity. But what about internal validity? Some participants may be dis tracted or intimidated by the audience. Others might have a tendency to show off, impress, or act out. Such behaviors introduce sources of variation outside the realm of the devices under test, and thereby compromise internal validity. So our effort to improve external validity through environmental considerations may negatively impact internal validity.

A second example pertains to the experimental procedure. Consider an experi ment comparing two methods of text entry. In an attempt to improve external valid ity, participants are instructed to enter whatever text they think of. The text may include punctuation symbols and uppercase and lowercase characters, and par ticipants can edit the text and correct errors as they go. Again, external validity is improved since this is what people normally do when entering text. However, inter nal validity is compromised because behaviors are introduced that are not directly related to the text entry techniques—behaviors such as pondering (What should

4.7 Comparative evaluations

I enter next?) and !ddling with commands (How do I move the cursor back and make a correction? How is overtype mode invoked?). Furthermore, since partici pants generate the text, errors are dif!cult to record since there is no “source text” with which to compare the entered text. So here again we see the compromise. The desire to improve external validity through procedural considerations may nega tively impact internal validity.

Unfortunately, there is no universal remedy for the tension between inter nal and external validity. At the very least, one must acknowledge the limitations. Formulating conclusions that are broader than what the results suggest is sure to raise the ire of reviewers. We can strive for the best of both worlds with a simple approach, however. Posing multiple narrow (testable) questions that cover the range of outcomes in#uencing the broader (untestable) questions will increase both inter nal and external validity. For example, a technique that is fast, accurate, easy to learn, easy to remember, and considered comfortable and enjoyable by users is gen erally better. Usually there is a positive correlation between the testable and untest able questions; i.e., participants generally !nd one UI better than another if it is faster and more accurate, takes fewer steps, is more enjoyable, is more comfortable, and so on.

Before moving on, it is worth mentioning ecological validity, a term closely related to external validity. The main distinction is in how the terms are used. Ecological validity refers to the methodology (using materials, tasks, and situations typical of the real world), whereas external validity refers to the outcome (obtaining results that generalize to a broad range of people and situations).

4.7 Comparative evaluations

Evaluating new ideas for user interfaces or interaction techniques is central to research in human-computer interaction. However, evaluations in HCI sometimes focus on a single idea or interface. The idea is conceived, designed, implemented, and evaluated—but not compared. The research component of such an evaluation is questionable. Or, to the extent the exercise is labeled research, it is more aligned with the second de!nition of research noted earlier: “collecting information about a particular subject.”

From a research perspective, our third de!nition is more appealing, since it includes the ideas of experimentation, discovery, and developing theories of inter action. Certainly, more meaningful and insightful results are obtained if a compar ative evaluation is performed. In other words, a new user interface or interaction technique is designed and implemented and then compared with one or more alter native designs to determine which is faster, more accurate, less confusing, more preferred by users, etc. The alternatives may be variations in the new design, an established design (a baseline condition), or some combination of the two. In fact, the testable research questions above are crafted as comparisons (e.g., “Is Method A faster than Method B for …?”), and for good reason. A controlled experiment must include at least one independent variable and the independent variable must have at

FIGURE 4.10

Including a baseline condition serves as a check on the methodology and facilitates the comparison of results between user studies.

least two levels or test conditions. Comparison, then, is inherent in research follow ing the experimental method discussed earlier. The design of HCI experiments is elaborated further in Chapter 5.

The idea of including an established design as a baseline condition is partic ularly appealing. There are two bene!ts. First, the baseline condition serves as a check on the methodology. Baseline conditions are well traveled in the research lit erature, so results in a new experiment are expected to align with previous results. Second, the baseline condition allows results to be compared with other studies. The general idea is shown in Figure 4.10. The results from two hypothetical user studies are shown. Both user studies are comparative evaluations and both include condition A as a baseline. Provided the methodology was more or less the same, the performance results in the two studies should be the same or similar for the baseline condition. This serves not only as a check on the methodology but also facilitates comparisons between the two user studies. A quick look at the charts suggests that condition C out-performs condition B. This is an interesting observation because condition C was evaluated in one study, condition B in another.

Consider the idea cited earlier of comparing two remote pointing devices for presentation systems. Such a study would bene!t by including a conven tional mouse as a baseline condition. 8 If the results for the mouse are consistent with those found in other studies, then the methodology was probably okay, and the results for the remote pointing devices are likely valid. Furthermore, conclu sions can often be expressed in terms of the known baseline condition, for example, “Device A was found to be about 8 percent slower than a conventional mouse.”

The value in conducting a comparative study was studied in research by Tohidi et al. (2006), who tested the hypothesis that a comparative evaluation yields more insight than a one-of evaluation. In their study, participants were assigned to groups and were asked to manually perform simple tasks with climate control interfaces 8 The example cited earlier on remote pointing devices included a conventional mouse as a baseline condition (MacKenzie and Jusoh, 2001).

4.8 Relationships: circumstantial and causal

(i.e., thermostats). There were three different interfaces tested. Some of the partici pants interacted with just one interface, while others did the same tasks with all three interfaces. The participants interacting with all three interfaces consistently found more problems and were more critical of the interfaces. They were also less prone to in#ate their subjective ratings. While this experiment was fully qualita tive—human performance was not measured or quanti!ed—the message is the same: a comparative evaluation yields more valuable and insightful results than a single-interface evaluation.

4.8 Relationships: circumstantial and causal

I noted above that looking for and explaining interesting relationships is part of what we do in HCI research. Often a controlled experiment is designed and con ducted speci!cally for this purpose, and if done properly a particular type of conclusion is possible. We can often say that the condition manipulated in the experiment caused the changes in the human responses that were observed and measured. This is a cause-and-effect relationship, or simply a causal relationship.

In HCI, the variable manipulated is often a nominal-scale attribute of an inter face, such as device, entry method, feedback modality, selection technique, menu depth, button layout, and so on. The variable measured is typically a ratio-scale human behavior, such as task completion time, error rate, or the number of button clicks, scrolling events, gaze shifts, etc.

Finding a causal relationship in an HCI experiment yields a powerful conclu sion. If the human response measured is vital in HCI, such as the time it takes to do a common task, then knowing that a condition tested in the experiment reduces this time is a valuable outcome. If the condition is an implementation of a novel idea and it was compared with current practice, there may indeed be reason to celebrate. Not only has a causal relationship been found, but the new idea improves on exist ing practice. This is the sort of outcome that adds valuable knowledge to the disci pline; it moves the state of the art forward. 9 This is what HCI research is all about!

Finding a relationship does not necessarily mean a causal relationship exists. Many relationships are circumstantial. They exist, and they can be observed, meas ured, and quanti!ed. But they are not causal, and any attempt to express the rela tionship as such is wrong. The classic example is the relationship between smoking and cancer. Suppose a research study tracks the habits and health of a large num ber of people over many years. This is an example of the correlational method of research mentioned earlier. In the end, a relationship is found between smoking and cancer: cancer is more prevalent in the people who smoked. Is it correct to con clude from the study that smoking causes cancer? No. The relationship observed is 9 Reporting a non-signi!cant outcome is also important, particularly if there is reason to believe a test condition might improve an interface or interaction technique. Reporting a non-signi!cant outcome means that, at the very least, other researchers needn’t pursue the idea further.

circumstantial, not causal. Consider this: when the data are examined more closely, it is discovered that the tendency to develop cancer is also related to other variables in the data set. It seems the people who developed cancer also tended to drink more alcohol, eat more fatty foods, sleep less, listen to rock music, and so on. Perhaps it was the increased consumption of alcohol that caused the cancer, or the consump tion of fatty foods, or something else. The relationship is circumstantial, not causal. This is not to say that circumstantial relationships are not useful. Looking for and !nding a circumstantial relationship is often the !rst step in further research, in part because it is relatively easy to collect data and look for circumstantial relationships.

Causal relationships emerge from controlled experiments. Looking for a causal relationship requires a study where, among other things, participants are selected randomly from a population and are randomly assigned to test conditions. A ran dom assignment ensures that each group of participants is the same or similar in all respects except for the conditions under which each group is tested. Thus, the differences that emerge are more likely due to (caused by) the test conditions than to environmental or other circumstances. Sometimes participants are balanced into groups where the participants in each group are screened so that the groups are equal in terms of other relevant attributes. For example, an experiment testing two input controllers for games could randomly assign participants to groups or balance the groups to ensure the range of gaming experience is approximately equal.

Here is an HCI example similar to the smoking versus cancer example: A researcher is interested in comparing multi-tap and predictive input (T9) for text entry on a mobile phone. The researcher ventures into the world and approaches mobile phone users, asking for !ve minutes of their time. Many agree. They answer a few questions about experience and usage habits, including their pre ferred method of entering text messages. Fifteen multi-tap users and 15 T9 users are found. The users are asked to enter a prescribed phrase of text while they are timed. Back in the lab, the data are analyzed. Evidently, the T9 users were faster, entering at a rate of 18 words per minute, compared to 12 words per minute for the multi-tap users. That’s 50 percent faster for the T9 users! What is the conclu sion? There is a relationship between method of entry and text entry speed; how ever, the relationship is circumstantial, not causal. It is reasonable to report what was done and what was found, but it is wrong to venture beyond what the meth odology gives. Concluding from this simple study that T9 is faster than multi-tap would be wrong. Upon inspecting the data more closely, it is discovered that the T9 users tended to be more tech-savvy: they reported considerably more experience using mobile phones, and also reported sending considerably more text messages per day than the multi-tap users who, by and large, said they didn’t like sending text messages and did so very infrequently. 10 So the difference observed may be due to prior experience and usage habits, rather than to inherent differences in the text entry methods. If there is a genuine interest in determining if one text entry method 10 Although it is more dif!cult to determine, perhaps technically savvy users were more willing to par ticipate in the study. Perhaps the users who declined to participate were predominantly multi-tap users.

is faster than another, a controlled experiment is required. This is the topic of the next chapter.

One !nal point deserves mention. Cause and effect conclusions are not possible in certain types of controlled experiments. If the variable manipulated is a naturally occurring attribute of participants, then cause and effect conclusions are unreliable. Examples of naturally occurring attributes include gender (female, male), person ality (extrovert, introvert), handedness (left, right), !rst language (e.g., English, French, Spanish), political viewpoint (left, right), and so on. These attributes are legitimate independent variables, but they cannot be manipulated, which is to say, they cannot be assigned to participants. In such cases, a cause and effect conclu sion is not valid because is not possible to avoid confounding variables (de!ned in Chapter 5). Being a male, being an extrovert, being left-handed, and so on always brings forth other attributes that systematically vary across levels of the independ ent variable. Cause and effect conclusions are unreliable in these cases because it is not possible to know whether the experimental effect was due to the independent variable or to the confounding variable.

4.9 Research topics

Most HCI research is not about designing products. It’s not even about designing applications for products. In fact, it’s not even about design or products. Research in HCI, like in most !elds, tends to nip away at the edges. The march forward tends to be incremental. The truth is, most new research ideas tend to build on existing ideas and do so in modest ways. A small improvement to this, a little change to that. When big changes do arise, they usually involve bringing to market, through engineering and design, ideas that already exist in the research literature. Examples are the !nger #ick and two-!nger gestures used on touchscreen phones. Most users likely encountered these for the !rst time with the Apple iPhone. The gestures seem like bold new advances in interaction, but, of course, they are not. The #ick gesture dates at least to the 1960s. Flicks are clearly seen in use with a light pen in the vid eos of Sutherland’s Sketchpad, viewable on YouTube. They are used to terminate a drawing command. Two-!nger gestures date at least to the 1970s. Figure 4.11 shows Herot and Weinzapfel’s (1978) two-!nger gesture used to rotate a virtual knob on a touch-sensitive display. As reported, the knob can be rotated to within 5 degrees of a target position. So what might seem like a bold new advance is often a matter of good engineering and design, using ideas that already exist.

Finding a research topic is often the most challenging step for graduate students in HCI (and other !elds). The expression “ABD” for “all but dissertation” is a sad reminder of this predicament. Graduate students sometimes !nd themselves in a position of having !nished all degree requirements (e.g., coursework, a teaching practicum) without nailing down the big topic for dissertation research. Students might be surprised to learn that seasoned researchers in universities and industry also struggle for that next big idea. Akin to writer’s block, the harder one tries, the

FIGURE 4.11

A two-finger gesture on a touch-sensitive display is used to rotate a virtual knob.

(Adapted from Herot and Weinzapfel, 1978)

less likely is the idea to appear. I will present four tips to overcome “researcher’s block” later in this section. First, I present a few observations on ideas and how and where they arise.

4.9.1 Ideas

In the halcyon days after World War II, there was an American television show, a situation comedy, or sitcom, called The Many Loves of Dobie Gillis (1959–1963). Much like Seinfeld many years later, the show was largely about, well, nothing. Dobie’s leisurely life mostly focused on getting rich or on endearing a beautiful woman to his heart. Each episode began with an idea, a scheme. The opening scene often placed Dobie on a park bench beside The Thinker, the bronze and marble statue by French sculptor Auguste Rodin (1840–1917). (See Figure 4.12.) After some pensive moments by the statue, Dobie’s idea, his scheme, would come to him. It would be nice if research ideas in HCI were similarly available and with such assurance as were Dobie’s ideas. That they are not is no cause for concern, how ever. Dobie’s plans usually failed miserably, so we might question his approach to formulating his plans. Is it possible that The Thinker, in his pose, is more likely to inspire writer’s block than the idea so desperately sought? The answer may be yes, but there is little science here. We are dealing with human thought, inspiration, creativity, and a milieu of other human qualities that are poorly understood, at best.

If working hard to !nd a good idea doesn’t work, perhaps a better approach is to relax and just get on with one’s day. This seems to have worked for the ancient Greek scholar Archimedes (287–212 BC) who is said to have effortlessly come upon a brilliant idea as a solution to a problem. As a scientist, Archimedes was called upon to determine if King Hiero’s crown was pure gold or if it was com promised with a lesser alloy. One solution was to melt the crown, separating the constituent parts. This would destroy the crown–not a good idea. Archimedes’ idea was simple, and he is said to have discovered it while taking a bath. Yes, taking a bath, rather than sitting for hours in The Thinker’s pose. He realized–in an instant– that the volume of water displaced as he entered the bathtub must equal the volume

FIGURE 4.12

Rodin’s The Thinker often appeared in the opening scenes of the American sitcom The

Many Loves of Dobie Gillis.

of his body. Immersing the crown in water would similarly yield the crown’s vol ume, and this, combined with the crown’s weight, would reveal the crown’s density. If the density of the crown equaled the known density of gold, the King’s crown was pure gold–problem solved. According to the legend, Archimedes was so elated at his moment of revelation that he jumped from his bath and ran nude about the streets of Syracuse shouting “Eureka!” (“I found it!”).

While legends make good stories, we are not likely to be as fortunate as Archimedes in !nding a good idea for an HCI research topic. Inspiration is not always the result of single moment of revelation. It is often gradual, with sources unknown or without a conscious and recognizable connection to the problem. Recall Vannevar Bush’s memex, described in the opening chapter of this book. Memex was a concept. It was never built, even though Bush described the interaction with memex in considerable detail. We know memex today as hypertext and the World Wide Web. But where and how did Bush get his idea? The starting point is having a problem to solve. The problem of interest to Bush was coping with ever-expanding volumes of information. Scientists like Bush needed a convenient way to access this information. But how? It seems Bush’s inspiration for memex came from… Let’s pause for a moment, lest we infer Bush was engaged in a structured approach to problem solving. It is not likely that Bush went to work one morning intent on solv ing the problem of information access. More than likely, the idea came without deliberate effort. It may have come #ittingly, in an instant, or gradually, over days, weeks, or months. Who knows? What is known, however, is that the idea did not arise from nothing. Ideas come from the human experience. This is why in HCI we often read about things like “knowledge in the head and knowledge in the world”

FIGURE 4.13

Pie menus in HCI: (a) The inspiration? (b) HCI example.

(Adapted from G. Kurtenbach, 1993)

(Norman, 1988, ch. 3) or metaphor and analogy (Carroll and Thomas, 1982). The context for inspiration is the human experience. So what was the source of Bush’s inspiration for memex? The answer is in Bush’s article, and also in Chapter 1.

Are there other examples relevant to HCI? Sure. Twitter co-founder Jack Dorsey is said to have come up with the idea for the popular micro-blogging site while sit ting on a children’s slide in a park eating Mexican food. 11 What about pie menus in graphical user interfaces? Pie menus, as an alternative to linear menus, were !rst pro posed by Don Hopkins at the University of Maryland in 1988 (cited in Callahan et al., 1988). We might wonder about the source of Hopkins’ inspiration (see Figure 4.13).

See also student exercises 4-2 and 4-3 at the end of this chapter.

4.9.2 Finding a topic

It is no small feat to !nd an interesting research topic. In the following paragraphs, four tips are offered on !nding a topic suitable for research. As with the earlier dis cussion on the cost and frequency of errors (see Figure 3-46), there is little science to offer here. The ideas follow from personal experience and from working with students and other researchers in HCI.

4.9.3 Tip #1: Think small!

At a conference recently, I had an interesting conversation with a student. He was a graduate student in HCI. “Have you found a topic for your research,” I asked. “Not really,” he said. He had a topic, but only in a broad sense. Seems his supervisor had funding for a large research project related to aviation. The topic, in a general sense, was to develop an improved user interface for an air traf!c control system. He was stuck. Where to begin? Did I have any ideas for him? Well, actually, no I didn’t. Who wouldn’t be stuck? The task of developing a UI for an air traf!c control system is huge. Furthermore, the project mostly involves engineering and

11 New York Times Oct 30, 2010, p BU1.

design. Where is the research in designing an improved system of any sort? What are the research questions? What are the experimental variables? Unfortunately, graduate students are often saddled with similar big problems because a supervi sor’s funding source requires it. The rest of our discussion focused on narrowing the problem—in a big way. Not to some de!nable sub-system, but to a small aspect of the interface or interaction. The smaller, the better.

The point above is to think small. On !nding that big idea, the advice is… forget it. Once you shed that innate desire to !nd something really signi!cant and important, it’s amazing what will follow. If you have a small idea, something that seems a lit tle useful, it’s probably worth pursuing as a research project. Pursue it and the next thing you know, three or four related interaction improvements come to mind. Soon enough, there’s a dissertation topic in the works. So don’t hesitate to think small.

4.9.4 Tip #2: Replicate!

An effective way to get started on research is to replicate an existing experiment from the HCI literature. This seems odd. Where is the research in simply replicating what has already been done? Of course, there is none. But there is a trick. Having taught HCI courses many times over many years, I know that students frequently get stuck !nding a topic for the course’s research project. Students frequently approach me for suggestions. If I have an idea that seems relevant to the student’s interests, I’ll suggest it. Quite often (usually!) I don’t have any particular idea. If nothing comes to mind, I take another approach. The student is advised just to study the HCI literature—research papers from the CHI proceedings, for example— and !nd some experimental research on a topic of interest. Then just replicate the experiment. Is that okay, I am asked. Sure, no problem.

The trick is in the path to replicating. Replicating a research experiment requires a lot of work. The process of studying a research paper and precisely determining what was done, then implementing it, testing it, debugging it, doing an experiment around it, and so on will empower the student—the researcher—with a deep under standing of the issues, much deeper than simply reading the paper. This moves the line forward. The stage is set. Quite often, a new idea, a new twist, emerges. But it is important not to require something new. The pressure in that may back!re. Something new may emerge, but this might not happen until late in the process, or after the experiment is !nished. So it is important to avoid a requirement for novelty. This is dif!cult, because it is germane to the human condition to strive for something new. Self-doubt may bring the process to a standstill. So keep the expec tations low. A small tweak here, a little change there. Good enough. No pressure. Just replicate. You may be surprised with the outcome.

4.9.5 Tip #3: Know the literature!

It might seem obvious, but the process of reviewing research papers on a topic of interest is an excellent way to develop ideas for research projects. The starting point is identifying the topic in a general sense. If one !nds gaming of inter est, then gaming is the topic. If one !nds social networking of interest, then that’s the topic. From there the task is to search out and aggressively study and analyze all published research on the topic. If there are too many publications, then narrow the topic. What, in particular, is the interest in gaming or social net working? Continue the search. Use Google Scholar, the ACM Digital Library, or whatever resource is conveniently available. Download all the papers, store them, organize them, study them, make notes, then open a spreadsheet !le and start tabulating features from the papers. In the rows, identify the papers. In the columns, tabulate aspects of the interface or interaction technique, conditions tested, results obtained, and so on. Organize the table in whatever manner seems reasonable.

The process is chaotic at !rst. Where to begin? What are the issues? The task is daunting, at the very least, because of the divergence in reporting methods. But that’s the point. The gain is in the process—bringing shape and structure to the chaos. The table will grow as more papers are found and analyzed. There are exam ples of such tables in published papers, albeit in a condensed summary form. Figure 4.14 shows an example from a research paper on text entry using small keyboards. The table amounts to a mini literature review. Although the table is neat and tidy, don’t be fooled. It emerged from a dif!cult and chaotic process of reviewing a col lection of papers and !nding common and relevant issues. The collection of notes in the right-hand column is evidence of the dif!culty. This column is like a dis claimer, pointing out issues that complicate comparisons of the data in the other columns.

Are there research topics lurking within Figure 4.14? Probably. But the point is the process, not the product. Building such a table shapes the research area into rel evant categories of inquiry. Similar tables are found in other research papers (e.g., Figure 11 and Figure 12 in MacKenzie, 1992; Table 3 and Table 4 in Soukoreff and MacKenzie, 2004). See also student exercise 4-4 at the end of this chapter.

4.9.6 Tip #4: Think inside the box!

The common expression “think outside the box” is a challenge to all. The idea is to dispense with accepted beliefs and assumptions (in the box) and to think in a new way that assumes nothing and challenges everything. However, there is a prob lem with the challenge. Contemporary, tech-savvy people, clever as they are, often believe they in fact do think outside the box, and that it is everyone else who is con!ned to life in the box. With this view, the challenge is lost before starting. If there is anything useful in tip #4, it begins with an unsavory precept: You are inside the box! All is not lost, however. Thinking inside the box, then, is thinking about and challenging one’s own experiences—the experiences inside the box. The idea is simple. Just get on with your day, but at every juncture, every interaction, think and question. What happened? Why did it happen? Is there an alternative? Play the

(1st Study author)

Number of Keysa

Direct/ Indirect

Scanning

Number of Participants

Speedb (wpm)

Notes

Bellman [2]

5

Indirect

No

11

11

4 cursors keys + SELECT key. Error rates not reported. No error correction method.

Dunlop [4]

4

Direct

No

12

8.90

4 letter keys + SPACE key. Error rates reported as "very low.”

Dunlop [5]

4

Direct

No

20

12

4 letter keys + 1 key for SPACE/NEXT. Error rates not reported. No error correction method.

Tanaka-Ishii [25

3

Direct

No

8

12+

4 letters keys + 4 keys for editing, and selecting. 5 hours training. Error rates not reported. Errors corrected using CLEAR key.

Gong [7]

3

Direct

No

32

8.01

3 letter keys + two additional keys. Error rate = 2.1%. Errors corrected using DELETE key.

MacKenzie [16]

3

Indirect

No

10

9.61

2 cursor keys + SELECT key. Error rate = 2.2%. No error correction method.

Baljko [1]

2

Indirect

Yes

12

3.08

1 SELECT key + BACKSPACE key. 43 virtual keys. RC scanning. Same phrase entered 4 times. Error rate = 18.5%. Scanning interval = 750 ms.

Simpson [24]

1

Indirect

Yes

4

4.48

1 SELECT key. 26 virtual keys. RC scanning. Excluded trials with selection errors or missed selections. No error correction. Scanning interval = 525 ms at end of study.

Koester [10]

1

Indirect

Yes

3

7.2

1 SELECT key. 33 virtual keys. RC scanning with word prediction. Dictionary size not given. Virtual BACKSPACE key. 10 blocks of trials. Error rates not reported. Included trials with selection errors or missed selections. Fastest participant: 8.4 wpm.

a For "direct" entry, the value is the number of letter keys. For "indirect" entry, the value is the total number of keys. b The entry speed cited is the highest of the values reported in each source, taken from the last block if multiple blocks.

FIGURE 4.14

Table s

(From MacKenzie, 2009b, Table 1; consult for full details on studies cited)

FIGURE 4.15

Elevator control panel. The button label is more prominent than the button.

role of both a participant (this is unavoidable) and an observer. Observe others, of course, but more importantly observe yourself. You are in the box, but have a look, study, and reconsider.

Here’s an example, which on the surface seems trivial (but see tip #1). Recently, while at work at York University, I was walking to my class on Human-Computer Interaction. Being a bit late, I was in a hurry. The class was in a nearby building on the third #oor and I was carrying some equipment. I entered the elevator and pushed the button—the wrong button. Apparently, for each #oor the control panel has both a button label and a button. (See Figure 4.15.) I pushed the button label instead of the button. A second later I pushed the button, and my journey continued. End of story.

Of course, there is more. Why did I push the wrong button? Yes, I was in a hurry, but that’s not the full reason. With a white number on a black background, the #oor is identi!ed more prominently by the button label than by the button. And the button label is round, like a button. On the button, the number is recessed in the metal and is barely visible. The error was minor, only a slip (right intention, wrong action; see Norman, 1988, ch. 5). Is there a research topic in this? Perhaps. Perhaps not. But experiencing, observing, and thinking about one’s interactions with technology can generate ideas and promote a humbling yet questioning frame of thinking—thinking that moves forward into research topics. The truth is, I have numerous moments like this every day (and so do you!). Most amount to nothing, but the small foibles in interacting with technology are intriguing and worth think ing about.

In this chapter, we have examined the scienti!c foundations for research in human-computer interaction. With this, the next challenge is in designing

4.9 Research topics

and conducting experiments using human participants (users) to evaluate new ideas for user interfaces and interaction techniques. We explore these topics in Chapter 5.

STUDENT EXERCISES

4-1. Examine some published papers in HCI and !nd examples where results were reported as a raw count (e.g., number of errors) rather than as a count per something (e.g., percent errors). Find three examples and write a brief report (or prepare a brief presentation) detailing how the results were reported and the weakness or limitation in the method. Propose a better way to report the same results. Use charts or graphs where appropriate.

4-2. What, in Vannevar Bush’s “human experience,” formed the inspiration for memex? (If needed, review Bush’s essay “As We May Think,” or see the dis cussion in Chapter 1.) What are the similarities between his inspiration and memex?

4-3. A !sheye lens or !sheye view is a tool or concept in HCI whereby high-value information is presented in greater detail than low-value information. Furnas !rst introduced the idea in 1986 (Furnas, 1986). Although the motivation was to improve the visualization of large data sets, such as programs or databases, Furnas’ idea came from something altogether different. What was Furnas’ inspiration for !sheye views? Write a brief report describing the analogy offered by Furnas. Include in your report three examples of !sheye lenses, as described and implemented in subsequent research, noting in particular the background motivation.

4-4. Here are some research themes: 3D gaming, mobile phone use while driv ing, privacy in social networking, location-aware user interfaces, tactile feed back in pointing and selecting, multi-touch tabletop interaction. Choose one of these topics (or another) and build a table similar to that in Figure 4.14. Narrow the topic, if necessary (e.g., mobile phone texting while driving), and !nd at least !ve relevant research papers to include in the table. Organize the table identifying the papers in the rows and methods, relevant themes, and !ndings in the columns. Write a brief report about the table. Include citations and references to the selected papers.

4-5. In Chapter 3, we used a 2D plot to illustrate the trade-off between the fre quency of errors (x-axis) and the cost of errors (y-axis) (see Figure 3.46). The plot was just a sketch, since the analysis was informal. In this chap ter, we discussed another trade-off, that between form and function. The

155

b

,4T .

I

HUMAN-COMPUTER INTERACTION, 1985, Volume 1, pp. 311-338

Copyright 0 1985, Lawrence Erlbaum Associates, Inc.

Direct Manipulation Interfaces

Edwin L. Hutchins, James D. Hollan, and Donald A. Norman

University of California, San Diego

ABSTRACT

Direct manipulation has been lauded as a good form of interface design, and some interfaces that have this property have been well received by users. In this article we seek a cognitive account of both the advantages and disadvantages of direct manipulation interfaces. We identify two underlying phenomena that give rise to the feeling of directness. One deals with the information processing distance between the user’s intentions and the facilities provided by the ma chine. Reduction of this distance makes the interface feel direct by reducing the effort required of the user to accomplish goals. The second phenomenon concerns the relation between the input and output vocabularies of the inter face language. In particular, direct manipulation requires that the system pro vide representations of objects that behave as if they are the objects themselves. This provides the feeling of directness of manipulation.

A version of this paper also appears as a chapter in the book, User Centered System De sign: New Perspectives on Human-Computer Interaction (Norman & Draper, 1986). Authors’present address: Edwin L. Hutchins, James D. Hollan, and Donald A. Nor man, Institute for Cognitive Science, University of California at San Diego, La Jolla,

CA 92093.

HUTCHINS, HOLLAN, NORMAN

CONTENTS

1. DIRECT MANIPULATION

1.1. Early Examples of Direct Manipulation 1.2. The Goal: A Cognitive Account of Direct Manipulation

2. TWO ASPECTS OF DIRECTNESS: DISTANCE AND ENGAGEMENT

2.1. Distance 2.2. Direct Engagement

3. TWO FORMS OF DISTANCE: SEMANTIC AND ARTICULATORY

3.1. Semantic Distance 3.2. Semantic Distance in the Gulfs of Execution and Evaluation

The Gulf of Execution The Gulf of Evaluation

3.3. Reducing the Semantic Distance That Must Be Spanned

Higher-Level Languages Make the Output Show Semantic Concepts Directly Automated Behavior Does Not Reduce Semantic Distance The User Can Adapt to the System Representation Virtuosity and Semantic Distance

3.4. Articulatory Distance 3.5. Articulatory Distance in the Gulfs of Execution and Evaluation

4. DIRECT ENGAGEMENT A SPACE OF INTERFACES

5 .

6. PROBLEMS WITH DIRECT MANIPULATION

1. DIRECT MANIPULATION

The best way to describe a direct manipulation interface is by example. Sup pose we have a set of data to be analyzed with the numbers stored in matrix form. Their source and meaning are not important for this example. The num bers could be the output of a spreadsheet, a matrix of numerical values from the computations of a conventional programming language, or the results of an experiment. Our goal is to analyze the numbers, to see what relations exist among the rows and columns of the matrix. The matrix of numbers is repre sented on a computer display screen by an icon. To plot one column against another, simply get a copy of a graph icon, then draw a line from the output of one column to the x-axis input of the graph icon and another line from the out put of the second column to the y-axis input (see Figure 1). Not what was wanted? Erase the lines and reconnect them. Want to see other graphs? Make more copies of the graph icons and connect them. Need a logarithmic transfor mation of one of the axes? Move up a function icon, type in the algebraic func tion that is desired 0, = log x, in this case) and connect it in the desired data stream. Want the analysis of variance of the logarithm of the data? Connect the matrix to the appropriate statistical icons. These examples are illustrated in Figure 1B.

4

An elementary example of doing simple statistical computations by di rect manipulation. (A) The basic components: The data are contained in the ma trix, represented by the icon in the upper left corner of the screen. At the bottom of the screen are basic icons that represent possible functions. To use one, a copy of the desired icon is moved to the screen and connected up, as is shown for the graph. (B) More complex interconnections, including the use of a logarithmic transformation of the data, a basic statistical package (for means and standard deviations), and an Analysis of Variance Package (ANOVA).

Figure . 1

A

B

313

Now consider how we could partition the data. Suppose one result of our analysis was the scatter diagram shown in Figure 2 . The straight line that has been fitted through the points is clearly inappropriate. The data fall into two quite different clusters and it would best to analyze each cluster separately. In the actual data matrix, the points that form the two clusters might be scattered randomly throughout the data set. The regularities are apparent only when we plot them. How do we pull out the clusters? Suppose we could simply circle the points of interest in the scatter plot and use each circled set as if it were a new matrix of values, each of which could be analyzed in standard ways, as shown in Figure 2B.

The examples of Figures 1 and 2 illustrate a powerful manipulation medium for computation. The promise of direct manipulation is that instead of an ab stract computational medium, all the “programming” is done graphically, in a form that matches the way one thinks about the problem. The desired opera tions are performed simply by moving the appropriate icons onto the screen and connecting them together. Connecting the icons is the equivalent of writ ing a program or calling on a set of statistical subroutines, but with the advan tage of being able to directly manipulate and interact with the data and the connections. There are no hidden operations, no syntax or command names to learn. What you see is what you get. Some classes of syntax errors are elimi nated. For example, you can’t point at a nonexistent object. The system re quires expertise in the task domain, but only minimal knowledge of the com puter or of computing.

The term direct manipulation was coined by Shneiderman (1974, 1982, 1983) to refer to systems having the following properties:

  1. Continuous representation of the object of interest.
  2. Physical actions or labeled button presses instead of complex syntax.
  3. Rapid incremental reversible operations whose impact on the object of interest is immediately visible. (Shneiderman, 1982, p. 251)

Direct manipulation interfaces seem remarkably powerful. Shneiderman (1982) has suggested that direct manipulation systems have the following virtues:

3 .

  1. Novices can learn basic functionality quickly, usually through a demon- stration by a more experienced user.
  2. Experts can work extremely rapidly to carry out a wide range of tasks, even defining new functions and features. Knowledgeable intermittent users can retain operational concepts.
  3. Error messages are rarely needed.
  4. Users can see immediately if their actions are furthering their goals, and if not, they can simply change the direction of their activity.

L

Figure . 2

(A) The scatter plot formed in Figure 1, along with the best fitting re gression line to the data. It is clear that the data really fall into two quite distinct clusters and that it would be best to look at each independently. (B) The clusters are analyzed by circling the desired data, then treating the group of circled data as if they were a new matrix of values, which can be treated as a data source and ana lyzed in standard ways.

6. Users have reduced anxiety because the system is comprehensible and because actions are so easily reversible. (Shneiderman, 1982, p. 251)

Can this really be true? Certainly there must be problems as well as benefits. It turns out that the concept of direct manipulation is complex. Moreover, al though there are important benefits there are also costs. Like everything else, direct manipulation systems trade off one set of virtues and vices against an other. It is important that we understand these trade-offs. A checklist of surface features is unlikely to capture the real sources of power in direct manipulation interfaces.

1.1. Early Examples of Direct Manipulation

Hints of direct manipulation programming environments have been around for quite some time. The first major landmark is Sutherland’s Sketchpad, a graphical design program (Sutherland, 1963). Sutherland’s goal was to devise a program that would make it possible for a person and a computer “to converse rapidly through the medium of line drawings.” Sutherland’s work is a land mark not only because of historical priority but because of the ideas that he helped develop: H e was one of the first to discuss the power of graphical inter faces, the conception of a display as “sheets of paper,” the use of pointing de vices, the virtues of constraint representations, and the importance of de picting abstractions graphically.

Sutherland’s ideas took 20 years to have widespread impact. The lag is per haps due more to hardware limitations than anything else. Highly interactive, graphical programming requires the ready availability of considerable computational power, and it is only recently that machines capable of sup porting this type of computational environment have become inexpensive enough to be generally available. Now we see these ideas in many of the computer-aided design and manufacturing systems, many of which can trace their heritage directly to Sutherland’s work. Borning‘s T h i n g L a b program (1979) explored a general programming environment, building upon many of Sutherland’s ideas within the Smalltalk programming environment. More re cently direct manipulation systems have been appearing with reasonable fre quency. For example, Bill Budge’s Pinball Construction Set (Budge, 1983) permits a user to construct an infinite variety of electronic pinball games by directly manipulating graphical objects that represent the components of the game sur face. Other examples exist in the area of intelligent training systems (e.g., the Steamer system of Hollan, Hutchins, & Weitzman, 1984; Hollan, Stevens, & Williams, 1980). Steamer makes use of similar techniques and also provides tools for the construction of interactive graphical interfaces. Finally, spread sheet programs incorporate many of the essential features of direct manipula tion. In the lead article of Scientific American’s special issue on computer soft ware, Kay (1984) claims that the development of dynamic spreadsheet systems gives strong hints that programming styles are in the offing that will make pro gramming as it has been done for the past 40 years - that is, by composing text that represents instructions - obsolete.

1.2. The Goal: A Cognitive Account of Direct Manipulation

We see promise in the notion of direct manipulation, but as yet we see no ex planation of it. There are systems with attractive features, and claims for the benefits of systems that give the user a certain sort of feeling, and even lists of properties that seem to be shared by systems that provide that feeling, but no account of how particular properties might produce the feeling of directness. The purpose of this article is to examine the underlying basis for direct manip ulation systems. O n the one hand, what is it that provides the feeling of“direct ness?” Why do direct manipulation systems feel so natural? What is so compelling about the notion? O n the other hand, why can using such systems sometimes seem so tedious?

For us, the notion of“direct manipulation” is not a unitary concept, nor even something that can be quantified in itself. It is an orienting notion. “Direct ness” is an impression or a feeling about an interface. What we seek to do here is to characterize the space of interfaces and see where within that picture the range of phenomena that contribute to the feeling of directness might reside. The goal is to give cognitive accounts of these phenomena. At the root of our approach is the assumption that the feeling of directness results from the com mitment of fewer cognitive resources. Or , put the other way around, the need to commit additional cognitive resources in the use of an interface leads to the feeling of indirectness. As we shall see, some of the production of the feeling of directness is due to adaptation by the user, so that the designer can neither completely control the process, nor take full credit for the feeling of directness that may be experienced by the user.

We will not attempt to set down hard and fast criteria under which an inter face can be classified as direct or not direct. The sensation of directness is al ways relative; it is often due to the interaction of a number of factors. There are costs associated with every factor that increases the sensation of directness. At present we know of no way to measure the trade-off values, but we will attempt to provide a framework within which one can say what is being traded off against what.

2. TWO ASPECTS OF DIRECTNESS: DISTANCE AND ENGAGEMENT

There are two distinct aspects of the feeling of directness. One involves a no tion of the distance between one’s thoughts and the physical requirements of the system under use. A short distance means that the translation is simple and straightforward, that thoughts are readily translated into the physical actions required by the system and that the system output is in a form readily inter preted in terms of the goals of interest to the user. We will use the term directness to refer to the feeling that results from interaction with an interface. The term distance will be used to describe factors which underlie the generation of the feeling of directness.

The second aspect of directness concerns the qualitative feeling of engage ment, the feeling that one is directly manipulating the objects of interest. There are two major metaphors for the nature of human-computer interaction, a conversation metaphor and a model-world metaphor. In a system built on the conversation metaphor, the interface is a language medium in which the user and system have a conversation about an assumed, but not explicitly rep resented world. In this case, the interface is an implied intermediary between the user and the world about which things are said. In a system built on the model-world metaphor, the interface is itself a world where the user can act, and which changes state in response to user actions. The world of interest is ex plicitly represented and there is no intermediary between user and world. Ap propriate use of the model-world metaphor can create the sensation in the user of acting upon the objects of the task domain themselves. We call this aspect of directness direct engagement.

2.1. Distance

We call one underlying aspect of directness distance to emphasize the fact that directness is never a property of the interface alone, but involves a relationship between the task the user has in mind and the way that task can be accom plished via the interface. Here the critical issues involve minimizing the effort required to bridge the gulf between the user’s goals and the way they must be specified to the system.

An interface introduces distance to the extent there are gulfs between a per son’s goals and knowledge and the level of description provided by the systems with which the person must deal. These are referred to as theguCfofexecution and the guyo f evaluation (Figure 3). The gulf of execution is bridged by making the commands and mechanisms of the system match the thoughts and goals of the user. The gulf of evaluation is bridged by making the output displays present a good conceptual model of the system that is readily perceived, interpreted, and evaluated. The goal in both cases is to minimize cognitive effort.

We suggest that the feeling of directness is inversely proportional to the amount of cognitive effort it takes to manipulate and evaluate a system and, moreover, that cognitive effort is a direct result of the gulfs of execution and evaluation. The better the interface to a system helps bridge the gulfs, the less cognitive effort needed and the more direct the resulting feeling of interaction.

2.2. Direct Engagement

The description of the nature of interaction to this point begins to suggest how to make a system less difficult to use, but it misses an important point, a point that is the essence of direct manipulation. The analysis of the execution and evaluation process explains why there is difficulty in using a system, and it says something about what must be done to minimize the mental effort re quired to use a system. But there is more to it than that. The systems that best exemplify direct manipulation all give the qualitative feeling that one is directly engaged with control of the objects- not with the programs, not with the com puter, but with the semantic objects of our goals and intentions. This is the feeling that Laurel (1986) discusses: a feeling of first-personness, of direct engagement with the objects that concern us. Are we analyzing data? Then we should be manipulating the data themselves; or if we are designing an analysis of data, we should be manipulating the analytic structures themselves. Are we

L

Figure . 3

The gulfs of execution and evaluation. Each gulf is unidirectional: The gulf of execution goes from goals to system state; the gulf of evaluation goes from system state to goals.

playing a game? Then we should be manipulating directly the game world, touching and controlling the objects in that world, with the output of the sys tem responding directly to our actions, and in a form compatible with them.

Historically, most interfaces have been built on the conversation metaphor. There is power in the abstractions that language provides (we discuss some of this later), but the implicit role of interface as an intermediary to a hidden world denies the user direct engagement with the objects of interest. Instead, the user is in direct contact with linguistic structures, structures that can be in terpreted as referring to the objects of interest, but that are not those objects themselves. Making the central metaphor of the interface that of the model world supports the feeling of directness. Instead of describing the actions of in terest, the user performs those actions. In a conventional interface, the system describes the results of the actions. In a model world the system directly pres ents the actions taken upon the objects. This change in central metaphor is made possible by relatively recent advances in technology. One of the exciting prospects for the study of direct manipulation is the exploration of the proper ties of systems that provide for direct engagement.

Building interfaces based on the model-world metaphor requires a special sort of relationship between the input interface language and the output inter face language. In particular, the output language must represent its subject of discourse in a way that natural language does not normally do. The expres sions of a direct manipulation output language must behave in such a way that the user can assume that they, in some sense, are the things they refer to. DiSessa (1985) calls this “naive realism.” Furthermore, the nature of the rela tionship between input and output language must be such that an output ex pression can serve as a component of an input expression. Draper (1986) has coined the term inter-referential 1 / 0 to refer to relationships between input and output in which an expression in one can refer to an expression in the other. When these conditions are met, it is as if we are directly manipulating the things that the system represents.

Thus, consider a system in which a file is represented by an image on the screen and actions are done by pointing to and manipulating the image. In this case, if we can specify a file by pointing at the screen representation, we have met the goal that an expression in the output language (in this case, an image) be allowed as a component of the input expression (in this case, by pointing at the screen representation). If we ask for a listing of files, we would want the re sult to be a representation that can, in turn, be used directly to specify the fur ther operations to be done. Notice that this is not how a conversation works. In conversation, one may refer to what has been said previously, but one cannot operate upon what has been said. This requirement does not necessarily imply an interface of pictures, diagrams, or icons. It can be done with words and de scriptions. The key properties are that the objects, whatever their form, have behaviors and can be referred to by other objects, and that referring to an object causes it to behave. In the file-listing example, we must be able to use the out put expression that represents the file in question as a part of the input expres sion calling for whatever operation we desire upon that file, and the output ex pression that represents the file must change as a result of being referred to in this way. The goal is to permit the user to act as if the representation is the thing itself. These conditions are met in many screen editors when the task is the ar rangement of strings of characters. The characters appear as they are typed. They are then available for further operations. We treat them as though they are the things we are manipulating. These conditions are also met in the statis tics example with which we opened this article (Figure l), and in Steamer. The special conditions are not met in file-listing commands on most systems, the commands that allow one to display the names and attributes of file structure. The issue is that the outputs of these commands are simply “names” of the ob jects, and operating on the names does nothing to the objects to which the names refer. In a direct manipulation situation, we would feel that we had the files in front of us, that the program that “listed” the files actually placed the files before us. Any further operation on the files would take place upon the very objects delivered by the directory-listing command. This would provide the feeling of directly manipulating the objects that were returned. The point is that when an interface presents a world of behaving objects rather than a language of description, manipulating a representation can have the same effects and the same feel as manipulating the thing being represented. The members of the audience of a well-staged play willfully suspend their be liefs that the players are actors and become directly engaged in the content of the drama. In a similar way, the user of a well-designed model-world interface can willfully suspend belief that the objects depicted are artifacts of some pro gram and can thereby directly engage the world of the objects. This is the es sence of the “first-personness” feeling of direct engagement. Let us now return to the issue of distance and explore the ways that an interface can be direct or indirect with respect to a particular task.

3. TWO FORMS OF DISTANCE: SEMANTIC AND ARTICULATORY

32 1

Whenever we interact with a device, we are using an interface language. That is, we must use a language to describe to the device the nature of the ac tions we wish to have performed. This is true regardless ofwhether we are deal ing with an interface based on the conversation metaphor or on the model world metaphor, although the properties of the language in the two cases are different. A description of desired actions is an expression in the interface language.

The notion of an interface language is not confined to the everyday meaning of language. Setting a switch or turning a steering wheel can be expressions in an interface language if switch setting or wheel turning are how one specifies the operations that are to be done. After an action has been performed, evalua tion of the outcome requires that the device make available some indication of what has happened: that output is an expression in the output interface lan guage. Output interface languages are often impoverished. Frequently the output interface language does not share vocabulary with the input interface language. Two forms of interface language-t wo dialects, if you will-must exist to span the gulfs between user and device: the input interface language and the output interface language.

Both the languages people speak and computer programming languages are almost entirely symbolic in the sense that there is an arbitrary relationship be tween the form of a vocabulary item and its meaning. The reference relation ship is established by convention and must be learned. There is no way to infer meaning from form for most vocabulary items. Because of the relative inde pendence of meaning and form we describe separately two properties of inter face languages: semantic distance and articulatory distance. Figure 4 summa rizes the relationship between semantic and articulatory distance. In the following sections we treat each of these distances separately and discuss them in relation to the gulfs of execution and evaluation.

3.1. Semantic Distance

Semantic distance concerns the relation of the meaning of an expression in the interface language to what the user wants to say. Two important questions about semantic distance are (1) Is i t possible to say what one wants to say in this lan guage? That is, does the language support the user’s conception of the task do main? Does it encode the concepts and distinctions in the domain in the same way that the user thinks about them? ( 2 ) Can the thing1 of interest be said concisely? Can the user say what is wanted in a straightforward fashion, or must the user

Every expression in the interface language has a meaning and a form. Semantic distance reflects the relationship between the user intentions and the meaning of expressions in the interface languages both for input and output. Artic ulatory distance reflects the relationship between the physical form of an expres sion in the interaction language and its meaning, again, both for input and output. The easier it is to go from the form or appearance of the input or output to meaning, the smaller the articulatory distance.

INTERFACE LANGUAGE

Meaning of Expresslon

Form of Expresslon

construct a complicated expression to do what appears in the user’s thoughts as a conceptually simple piece of work?

Semantic distance is an issue with all languages. Natural languages gener ally evolve such that they have rich vocabularies for domains that are of impor tance to their speakers. When a person learns a new language- especially when the language is from a different culture - the new language may seem in direct, requiring complicated constructs to describe things the learner thinks should be easy to say. But the differences in apparent directness reflect differ ences in what things are thought important in the two cultures. Natural lan guages can and do change as the need arises. This occurs through the introduc tion of new vocabulary or by changing the meaning of existing terms. The result is to make the language semantically more direct with respect to the topic of interest.

3.2. Semantic Distance in the Gulfs of Execution and Evaluation

Beware the Turing tar-pit in which everything is possible but nothing of interest is easy (Perlis, 1982, p. 10).

The Gulf of Execution

At the highest level of description, a task may be described by the user’s in tention: “compose this piece” or “format this paper.” At the lowest level of de scription, the performance of the task consists of the shuffling of bits inside the machine. Between the interface and the low-level operations of the machine is

1

the system-provided task-support structure that implements the expressions in the interface language. The situation that Perlis (1982) called the “Turing tar pit” is one in which the interface language lies near or at the level of bit shuf fling of a very simple abstract machine. In this case, the entire burden of spanning the gulf from user intention to bit manipulation is carried by the user. The relationship between the user’s intention and the organization of the instructions given to the machine is distant, complicated, and hard to follow. Where the machine is of minimal complexity, as is the case with the Turing machine example, the wide gulf between user intention and machine instruc tions must be filled by the user’s extensive planning and translation activities. These activities are difficult and rife with opportunities for error.

Semantic directness requires matching the level of description required by the interface language to the level at which the person thinks of the task. It is al ways the case that the user must generate some information-processing struc ture to span the gulf. Semantic distance in the gulf of execution reflects how much of the required structure is provided by the system and how much by the user. The more that the user must provide, the greater the distance to be bridged.

The Gulf of Evaluation

O n the evaluation side, semantic distance refers to the amount of processing structure that is required for the user to determine whether the goal has been achieved. If the terms of the output are not those of the user’s intention, the user will be required to translate the output into terms that are compatible with the intention in order to make the evaluation. For example, suppose a user’s intent is to control how fast the water level in a tank rises. The user does some controlling action and observes the output. But if the output only shows the current value, the user has to observe the value over time and mentally com pare the values at different times to see what the rate of change is (see Figure 5). The information needed for the evaluation is in the output, but it is not there in a form that directly fits the terms of the evaluation. The burden is on the user to perform the required transformations, and that requires effort. Suppose the rate of change were directly displayed, as in Figure 5B. This indication re duces the mental workload, making the semantic distance between intentions and output language much shorter.

3.3. Reducing the Semantic Distance That Must Be Spanned

Figure 5 provides one illustration of how semantic distance can be changed. In general, there are only two basic ways to reduce the distance, one from the system side (requiring effort on the part of the system designer), the other from the user side (requiring effort on the part of the user). Each direction of bridge building has several components. Here let us consider the following possibili

324

HUTCHINS, HOLLAN, NORMAN

Matching user’s intentions by appropriate output language. The user at tempts to control the rate at which the water level in the tank is rising. In (A), the only indication is a meter that shows the current level. This requires the user to ob serve the meter over time and to do a mental computation on the observations.

(B)

shows a display that is more semantically direct: The rate of change is graphically indicated. (These illustrations are from the working Steamer system of Hollan, Hutchins, & Weitzman, 1984.)

ties: (1) The designer can construct higher-level and specialized languages that move toward the user, making the semantics of the input and output languages match that of the user. (2) The user can develop competence by building new mental structures to bridge the gulfs. In particular, this requires the user to au tomate the response sequence and to learn to think in the same language as that required by the system.

Higher-Level Languages

One way to bridge the gulf between the intentions of the user and the specifi cations required by the computer is well known: Provide the user with a higher-level language, one that directly expresses frequently encountered structures of problem decomposition. Instead of requiring the complete de composition of the task to low-level operations, let the task be described in the same language used within the task domain itself. Although the computer still requires low-level specification, the job of translating from the domain lan guage to the programming language can be taken over by the machine itself.

This implies that designers ofhigher-level languages should consider how to develop interface languages for which it will be easy for the user to create the mediating structure between intentions and expressions in the language. One way to facilitate this process is to provide consistency across the interface sur face. That is, if the user builds a structure to make contact with some part of the interface surface, a savings in effort can be realized if it is possible to use all or part of that same structure to make contact with other areas.

The result of matching a language to the task domain brings both good news and bad news. The good news is that tasks are easier to specify. Even if consid erable planning is still required to express a task in a high-level language, the amount of planning and translation that can be avoided by the user and passed off to the machine can be enormous. The bad news is that the language has lost generality. Tasks that do not easily decompose into the terms of the language may be difficult or impossible to represent. In the extreme case, what can be done is easy to do, but outside that specialized domain, nothing can be done. The power of a specialized language system derives from carefully specified primitive operations, selected to match the predicted needs of the user, thus capturing frequently occurring structures of problem decomposition. The trouble is that there is a conflict between generality and matching to any spe cific problem domain. Some high-level languages and operating systems have attempted to close the gap between user intention and the interaction language while preserving freedom and ease of general expression by allowing for exten sibility of the language or operating system. Such systems allow the users to move the interface closer to their conception of the task.

The Lisp language and the UNIX operating system serve as examples of this phenomenon. Lisp is a general-purpose language, but one that has extended it self to match a number of special high-level domains. As a result, Lisp can be thought of as having numerous levels on top of the underlying language ker nel. There is a cost to this method. As more and more specialized domain lev els get added, the language system gets larger and larger, becoming more clumsy to use, more expensive to support, and more difficult to learn. Just look at any of the manuals for the large Lisp systems (Interlisp, Zetalisp) to get a feel for the complexity involved. The same is true for the UNIX operating sys tem, which started out with a number of low-level, general primitive opera tions. Users were allowed (and encouraged) to add their own, more specialized operations, or to package the primitives into higher-level operations. The re sults in all these cases are massive systems that are hard to learn and that re quire a large amount of support facilities. The documentation becomes huge, and not even system experts know all that is present. Moreover, the difficulty of maintaining such a large system increases the burden on everyone, and the possibility of having standard interfaces to each specialized function has long been given up.

The point is that as the interface approaches the user’s intention end of the gulf, functions become more complicated and more specialized in purpose. Because of the incredible variety of human intentions, the lexicon of a lan guage that aspires to both generality of coverage and domain-specific functions can grow very large. In any of the modern dialects of Lisp one sees a microcosm

326

HUTCHINS, HOLLAN, NORMAN

of the argument about high-level languages in general. The fundamentals of the language are simple, but a great deal of effort is required to do anything useful at the low level of the language itself. Higher-level functions written in terms of lower-level ones make the system easier to use when the functions match intentions, but in doing so they may restrict possibilities, proliferate vo cabulary, and require that a user know an increasing amount about the lan guage of interaction rather than the domain of action.

Make the Output Show Semantic Concepts Directly

An example of reducing semantic distance on the output side is provided by the scenario of controlling the rate of filling a water tank, described in Figure 5 . In that situation, the output display was modified to show rate of flow di rectly, something normally not displayed but instead left to the user to com pute mentally.

In similar fashion, the change from line-oriented text editors to screen oriented text editors, where the effects of editing commands can be seen in stantly, is another example of matching the display to the user’s semantics. In general, the development of WYSIWYG (“What You See Is What You Get”) systems provides other examples. And finally, spreadsheet programs have been valuable, in part because their output format continually shows the state of the system as values are changed.

The attempt to develop good semantic matches with the system output con fronts the same conflict between generality and power faced in the design of in put languages. If the system is too specific and specialized, the output displays lack generality. If the system is too rich, the user has trouble learning and se lecting among the possibilities. One solution for both the output and input problem is to abandon hope of maintaining general computing and output ability and to develop special-purpose systems for particular domains or tasks. In such a world, the location of the interface in semantic space is pushed closer to the domain language description. Here, things of interest are made simple because the lexicon of the interface language maps well into the lexicon of do main description. Considerable planning may still go on in the conception of the domain itself, but little or no planning or translation is required to get from the language of domain description to the language of the interface. The price paid for these advantages is a loss of generality: Many things are unnatural or even impossible.

Automated Behavior Does Not Reduce Semantic Distance

Cognitive effort is required to plan a sequence of actions to satisfy some in tent. Generally, the more structure required of the user, the more effort use of the system will entail. However, this gap can be overcome if the users become familiar enough with the system. Structures that are used frequently need not

.

DIRECT MANIPULATION INTERFACES

327

be rebuilt every time they are needed if they have been remembered. Thus, a user may remember how to do something rather than having to rederive how to do it. It is well known that when tasks are practiced sufficiently often, they be come automated, requiring little or no conscious attention. As a result, over time the use of an interface to solve a particular set of problems will feel less difficult and more direct. Experienced users will sometimes argue that the in terface they use directly satisfies their intentions, even when less skilled users complain of the complexity of the structures. T o skilled users, the interface feels direct because the invocation of mediating structure has been automated. They have learned how to transform frequently arising intentions into action specifications. The result is a feeling of directness as compelling as that which results from semantic directness. As far as such users are concerned, the inten tion comes to mind and the action gets executed. There are no conscious intervening stages. (For example, a user of the vi text editor expressed this as follows: “I am an expert user of vi, and when I wish to delete a word, all I do is think ‘delete that word,’ my fingers automatically type ‘dw,’ and the word dis appears from the screen. How could anything be more direct?”)

The frequent use of even a poorly designed interface can sometimes result in a feeling of directness like that produced by a semantically direct interface. A user can compensate for the deficiencies of the interface through continual use and practice so that the ability to use it becomes automatic, requiring little conscious activity. While automatism is one factor which can contribute to a feeling of directness, it is essential for an interface designer to distinguish it from semantic distance. Automatization does not reduce the semantic distance that must be spanned; the gulfs between a user’s intentions and the interface must still be bridged by the user. Although practice and the resulting expertise can make the crossing less difficult, it does not reduce the magnitude of the gulfs. Planning activity may be replaced by a single memory retrieval so that instead of figuring out what to do, the user remembers what to do. Automati zation may feel like direct control, but it comes about for completely different reasons than semantic directness. Automatization is useful, for it improves the interaction of the user with the system, but the feeling of directness it produces depends only on how much practice a particular user has with the system and thus gives the system credit for the work the user has done. Although we need to remember that this happens, that users may adjust themselves to the interface and, with sufficient practice, may view it as directly supporting their inten tions, we need to distinguish between the cases in which the feeling of direct ness originates from a close semantic coupling between intentions and the in terface language and that which originates from practice. The resultant feeling of directness might be the same in the two cases, but there are crucial differ ences between how the feeling is acquired and what one needs to do as an inter face designer to generate it.

HUTCHINS, HOLLAN, NORMAN

The User Can Adapt to the System Representation

Another way to span the gulf is for the users to change their own conceptual ization of the problem so that they come to think of it in the same terms as the system. In some sense, this means that the gulf is bridged by moving the user closer to the system. Because of their experience with the system, the users change both their understanding of the task and the language with which they think about issues. This is related to the notion of linguistic determinism. If it is true that the way we think about something is shaped by the vocabulary we have for talking about it, then it is important for the designer of a system to pro vide the user with a good representation ofthe task domain in question. The in terface language should provide a powerful, productive way of thinking about the domain.

This form of the users adapting to the system representation takes place at a more fundamental level than the other ways of reducing semantic distance. While moving the interface closer to the users’ intentions may make it difficult to realize some intentions, changing the users’ conception of the domain may prevent some intentions from arising at all. So while a well-designed special purpose language may give the users a powerful way of thinking about the do main, it may also restrict the users’ flexibility to think about the domain in dif ferent ways.

The assumption that a user may change conceptual structure to match the interface language follows from the notion that every interface language implies a representation of the tasks it is applied to. The representation im plied by an interface is not always a coherent one. Some interfaces provide a collection of partially overlapping views of a task domain. If a user is to move toward the model implied by the interface, and thus reduce the semantic dis tance, that model should be coherent and consistent over some conception of the domain. There is, of course, a trade-off here between the costs to the user of learning a new way to think about a domain and the potential added power of thinking about it in the new way.

Virtuosity and Semantic Distance

Sometimes users have a conception of a task and of a system that is broader and more powerful than that provided by an interface. The structures they build to make contact with the interface go beyond it. This is how we character ize virtuoso performances in which the user may “misuse” limited interface tools to satisfy intentions that even the system designer never anticipated. In such cases of virtuosity the notion of semantic distance becomes more compli cated and we need to look very carefully at the task that is being accomplished. Semantic directness always involves the relationship between the task one wishes to accomplish and the ways the interface provides for accomplishing it.

.

DIRECT MANIPULATION INTERFACES

329

If the task changes, then the semantic directness of the interface may also change.

Consider a musical example: Take the task of producing a middle-C note on two musical instruments, a piano and a violin. For this simple task, the piano provides the more direct interface because all one need do is find the key for middle-C and depress it, whereas on the violin, one must place the bow on the G string, place a choice of fingers in precisely the right location on that string, and draw the bow. A piano’s keyboard is more semantically direct than the vio lin’s strings and bow for the simple task of producing notes. The piano has a single well-defined vocabulary item for each of the notes within its range, while the violin has an infinity of vocabulary items, many of which do not pro duce proper notes at all. However, when the task is playing a musical piece well rather than simply producing notes, the directness of the interfaces can change. In this case, one might complain that a piano has a very indirect inter face because it is a machine with which the performer “throws hammers at strings.” The performer has no direct contact with the components that actu ally produce the sound, and so the production of desired nuances in sound is more difficult. Here, as musical virtuosity develops, the task that is to be ac complished also changes from just the production of notes to concern for how to control more subtle characteristics of the sounds like vibrato, the slight changes in pitch used to add expressiveness. For this task the violin provides a semantically more direct interface than the piano. Thus, as we have argued earlier, an analysis of the nature of the task being performed is essential in determining the semantic directness of an interface.

3.4. Articulatory Distance

In addition to its meaning, every vocabulary item in every language has a physical form and that form has an internal structure. Words in natural lan guages, for example, have phonetic structure when spoken and typographic structure when printed. Similarly, the vocabulary items that constitute an in terface language have a physical structure. Where semantic distance has to do with the relationship between user’s intentions and meanings of expressions, articulatol-y distance has to do with the relationship between the meanings of ex pressions and their physical form. O n the input side, the form may be a se quence of character-selecting key presses for a command language interface, the movement of a mouse and the associated “mouse clicks” in a pointingdevice interface, or a phonetic string in a speech interface. O n the output side, the form might be a string of characters, a change in an iconic representation, or variation in an auditory signal.

There are ways to design languages such that the relationships between the forms of the vocabulary items and their meanings are not arbitrary. One tech nique is to make the physical form of the vocabulary items structurally similar to their meanings. In spoken language this relationship is called onomato poeia. Onomatopoetic words in spoken language refer to their meanings by imitating the sound they refer to. Thus we talk about the “boom” of explosions or the “cock-a-doodle-doo” of roosters. There is an economy here in that the user’s knowledge of the structure of the surface acoustical form has a non arbitrary relation to meaning. There is a directness of reference in this imitation; an intervening level of arbitrary symbolic relations is eliminated. Other uses of language exploit this effect partially. Thus, although the word “ long is arbitrarily associated with its meaning, sentences like “She stayed a looooooooooong time” exploit a structural similarity between the surface form of “long” (whether written or spoken) and the intended meaning. The same sorts of things can be done in the design of interface languages.

In many ways, the interface languages should have an easier time of exploiting articulatory similarity than do natural languages because of the rich technological base available to them. Thus, if the intent is to draw a diagram, the interface might accept as input drawing motions. In turn, it could present as output diagrams, graphs, and images. If one is talking about sound patterns in the input interface language, the output could be the sounds themselves. The computer has the potential to exploit articulatory similarities through technological innovation in the varieties of dimensions upon which it can op erate. This potential has not been exploited, in part because of economic con straints. The restriction to simple keyboard input limits the form and structure of the input languages and the restriction to simple, alphanumeric terminals with small, low-resolution screens, limits the form and structure of the output languages.

3.5. Articulatory Distance in the Gulfs of Execution and

Evaluation

The relationships among semantic distance, articulatory distance, and the gulfs of execution and evaluation are illustrated in Figure 6.

Take the simple, commonplace activity of moving a cursor on the screen. If we do this by moving a mouse, pointing with a finger or a light pen at the screen, or otherwise mimicking the desired motion, then at the level of action execution, these interactions all exhibit articulatory directness. The meaning of the intention is cursor movement and the action is specified by means of a similar movement. One way to achieve articulatory directness at the input side is to provide an interface that permits specification of an action by mimicking it, thus supporting an articulatory similarity between the vocabulary item and its meaning. Any nonarbitrary relationship between the form of an item and its meaning can be a basis for articulatory directness. While structural rela tionships of form to meaning may be desirable, it is sometimes necessary to re

33 1

sort to an arbitrary relationship of form to meaning. Still, some arbitrary rela tionships are easier to learn than others. It may be possible to exploit previous user knowledge in creating this relationship. Much of the work on command names in command language interfaces is an instance of trying to develop memorable and discriminable relationships between the forms and the mean ings ofcommand names (Black & Moran, 1982; Black & Sebrechts, 1981; Car rol, 1985). Articulatory directness on the output side is similar. If the user is following the changes in some variable, a moving graphical display can provide articula tory directness. A table of numbers, although containing the same semantic in formation, does not provide articulatory directness. Thus, the graphical display and the table of numbers might be equal in semantic directness, but unequal in articulatory directness. The goal of designing for articulatory di rectness is to couple the perceived form of action and meaning so naturally that the relationships between intentions and actions and between actions and out put seem straightforward and obvious. In general, articulatory directness is highly dependent upon I /O technol ogy. Increasing the articulatory directness of actions and displays requires a much richer set of input/output devices than most systems currently have. In addition to keyboards and bit-mapped screens, we see the need for various forms of pointing devices. Such pointing devices have important spatio-mimetic properties and thus support the articulatory directness of input for tasks that can be represented spatially. The mouse is useful for a wide variety of tasks not because of any properties inherent in itself, but because we map so many kinds of relationships (even ones that are not intrinsically spatial) on to spatial meta phors. In addition, there are often needs for sound and speech, certainly as out puts, and possibly as inputs. Precise control of timing will be necessary for those applications where the domain of interest is time sensitive. Perhaps it is stretching the imagination beyond its willing limits, but Galton (1894) sug gested and carried out a set of experiments on doing arithmetic by sense of smell. Less fancifully conceived, input might be sensitive not only to touch, place, and timing, but also to pressure or to torque (see Buxton, 1986; Minsky,

1984).

4. DIRECT ENGAGEMENT

Direct engagement occurs when a user experiences direct interaction with the objects in a domain. Here there is a feeling of involvement directly with a world of objects rather than of communication with an intermediary. The in teractions are much like interacting with objects in the physical world. Actions apply to the objects, observations are made directly upon those objects, and the interface and the computer become invisible. Although we believe this feeling of direct engagement to be of critical importance, in fact, we know little about

7

the actual requirements for producing it. Laurel (1986) discusses some of the requirements. At a minimum, to allow a feeling of direct engagement the sys tem requires the following:

Execution and evaluation should exhibit both semantic and articulatory directness.

Input and output languages of the interface should be inter-referential, allowing an input expression to incorporate or make use of a previous output expression. This is crucial for creating the illusion that one is di rectly manipulating the objects of concern.

The system should be responsive, with no delays between execution and the results, except where those delays are appropriate for the knowledge domain itself.

The interface should be unobtrusive, not interfering or intruding. If the interface itself is noticed, then it stands in a third-person relationship to the objects of interest, and detracts from the directness of the engage ment.

In order to have a feeling of direct engagement, the interface must provide the user with a world in which to interact. The objects of that world must feel like they are the objects of interest, that one is doing things with them and watching how they react. In order for this to be the case, the output language must present representations of objects in forms that behave in the way that the user thinks of the objects behaving. Whatever changes are caused in the objects by the set of operations must be depicted in the representation of the objects. This use of the same object as both an input and output entity is essential to providing objects that behave as if they are the real thing. It is because an input expression can contain a previous output expression that the user feels the out put expression is the thing itself and that the operation is applied directly to the thing itself.

In addition, all of the discussions of semantic and articulatory directness ap ply here too, because the designer of the interface must be concerned with what is to be done and how one articulates that in the languages of interaction. But the designer must also be concerned with creating and supporting an illusion. The specification of what needs to be done and evidence that it has been done must not violate the illusion, else the feeling of direct engagement will be lost.

One factor that seems especially relevant to maintaining this illusion is the form and speed of feedback. Rapid feedback in terms of changes in the behav ior of objects not only allows for the modification of actions even as they are be ing executed, but also supports the feeling of acting directly on the objects themselves. It removes the perception of the computer as an intermediary by providing continual representation of system state. In addition, rapidity of feedback and continual representation of state allows one to make use of per ceptual faculties in evaluating the outcome of actions. We can watch the ac tions take place, monitoring them much like we monitor our interactions with the physical world. The reduction in the cognitive load of mentally main taining relevant information and the form of the interaction contribute to the feeling of engagement.

5. A SPACE OF INTERFACES

Distance and engagement are depicted in Figure 7 as two major dimensions in a space of interface designs. The dimension of engagement has two land mark values: One is the metaphor of interface as conversation; the other is the metaphor of interface as model world. The dimension of distance actually con tains two distances to be spanned: semantic and articulatory distances, the two kinds of gulfs that lie between the user’s conception of the task and the interface language.

The least direct interface is often one that provides a low-level language in terface, for this is apt to provide the weakest semantic match between inten tions and the language of the interface. In this case, the interface is an interme diary between the user and the task. Even worse, it is an intermediary that does not understand actions at the level of description in which the user likes to think of them. Here the user must translate intentions into complex or lengthy expressions in the language that the interface intermediary can understand.

A more direct situation arises when the central metaphor of the interface is a world. Then the user can be directly engaged with the objects in a world; but still, if the actions in that world do not match those that the user wishes to per form within the task domain, getting the task done may be a difficult process. The user may believe that things are getting done and may even experience a sense of engagement with the world, yet still be doing things at too low a level. This is the state of some of the recently introduced direct manipulation sys tems: They produce an immediate sense of engagement, but as the user devel ops experience with the system, the interface appears clumsy, to interfere too much, and to demand too many actions and decisions at the wrong level of specification. These interfaces appear on the surface to be direct manipulation interfaces, but they fail to produce the proper feelings of direct engagement with the task world.

Closing the distance between the user’s intentions and the level of specifica tion of the interface language allows the user to make efficient specifications of intentions. Where this is done with a high-level language, quite efficient inter faces can be designed. This is the situation in most modern integrated pro

A space of interfaces. The dimensions of distance from user goals and de gree of engagement form a space of interfaces within which we can locate some fa miliar types of interfaces. Direct manipulation interfaces are those that minimize the distances and maximize engagement. As always, the distance between user in tentions and the interface language depends on the nature of the task the user is performing.

Figure . 7

Interface as conversation

Interface as model world

Engagement

gramming environments. For some classes of tasks, such interfaces may be su perior to direct manipulation interfaces.

Finally, the most direct of the interfaces will lie where engagement is maximized, where just the right semantic and articulatory matches are pro vided, and where all distances are minimized.

6. PROBLEMS WITH DIRECT MANIPULATION

Direct manipulation systems have both virtues and vices. For instance, the immediacy of feedback and the natural translation of intentions to actions make some tasks easy. The matching of levels of thought to the interface language - semantic directness - increases the ease and power of performing some activities at a potential cost of generality and flexibility. But not all things should be done directly. For example, a repetitive operation is probably best done via a script, that is, through a symbolic description of the tasks that are to be accomplished. Direct manipulation interfaces have difficulty han dling variables, or distinguishing the depiction of an individual element from a representation of a set or class of elements. Direct manipulation interfaces have problems with accuracy, for the notion of mimetic action puts the respon sibility on the user to control actions with precision, a responsibility that is sometimes best handled through the intelligence of the system and sometimes best communicated symbolically.

A more fundamental problem with direct manipulation interfaces arises from the fact that much of the appeal and power of this form of interface comes from its ability to directly support the way we normally think about a domain. A direct manipulation interface amplifies our knowledge of the domain and al lows us to think in the familiar terms of the application domain rather than in those of the medium of computation. But if we restrict ourselves to only build ing an interface that allows us to do things we can already do and to think in ways we already think, we will miss the most exciting potential of new technol ogy: to provide new ways to think of and to interact with a domain. Providing these new ways and creating conditions that will make them feel direct and nat ural is an important challenge to the interface designer.

Direct manipulation interfaces are not a panacea. Although with sufficient practice by the user many interfaces can come to feel direct, a properly de signed interface, one which exploits semantic and articulatory directness, should decrease the amount of learning required and provide a natural mapping to the task. But interface design is subject to many tradeoffs. There are surely instances when one might wisely trade off directness for generality, or for more facile ways of saying abstract things. The articulatory directness involved in pointing at objects might need to be traded off against the difficul ties of moving the hands between input devices or of problems in pointing with great precision.

It is important not to equate directness with ease of use. Indeed, if the inter face is really invisible, then the difficulties within the task domain get trans ferred directly into difficulties for the user. Suppose the user struggles to for mulate an intention because of lack of knowledge of the task domain. The user may complain that the system is difficult to use. But the difficulty is in the task domain, not in the interface language. Direct manipulation interfaces do not pretend to assist in overcoming problems that result from poor understanding of the task domain.

What about the claims for direct manipulation? We believe that direct ma nipulation systems carry gains in ease of learning and ease of use. If the mapping is done correctly, then both the form and the meaning of commands should be easier to acquire and retain. Interpretation of the output should be immediate and straightforward. If the interface is a model of the task domain, then one could have the feeling of directly engaging the problem of interest it self. It is sometimes said that in such situations the interface disappears. It is probably more revealing to say that the interface is no longer recognized as an interface.

But are these desirable features? Are the trade-offs too costly? As always, we are sure that the answer will depend on the tasks to be accomplished. Certain kinds of abstraction that are easy to deal with in language seem difficult in a concrete model of a task domain. When we give up the conversation metaphor, we also give up dealing in descriptions, and in some contexts, there is great power in descriptions. As an interface to a programming task, direct manipu lation interfaces are problematic. We know of no really useful direct manipu lation programming environments. Issues such as controlling the scope of var iable bindings promise to be quite tricky in the direct manipulation environ ments. Will direct manipulation systems live up to their promise? Yes and no. Basically, the systems will be good and powerful for some purposes, poor and weak for others. In the end, many things done today will be replaced by direct manipulation systems. But we will still have conventional programming languages.

O n the surface, the fundamental idea of a direct manipulation interface to a task flies in the face of two thousand years of development of abstract formal isms as a means of understanding and controlling the world. Until very re cently, the use of computers has been an activity squarely in that tradition. So the exterior ofdirect manipulation, providing as it does for the direct control of a specific task world, seems somehow atavistic, a return to concrete thinking. O n the inside, of course, the implementation ofdirect manipulation systems is yet another step in that long, formal tradition. The illusion of the absolutely manipulable concrete world is made possible by the technology of abstraction.

Acknowledgments. We thank Ben Shneiderman for his helpful comments on an earlier draft of the chapter, Eileen Conway for her aid with the illustrations, and Julie Norman and Sondra Buffett for extensive editorial comments.

Support. The research reported here was conducted under Contract NOOOl4-85-C 0133, NR 667-541 with the Personnel and Training Research Programs of the Office of Naval Research and with the support of the Navy Personnel Research and Develop ment Center. The views and conclusions contained in this document are those ofthe au thors and should not be interpreted as necessarily representing the official policies, ei ther expressed or implied, of the sponsoring agency.

REFERENCES

Black, J . B . , & Moran, T. P. (1982). Learning and remembering command names. Pro

ceedings ofthe Human Factors in Computer Systems Conference, 8-1 1. New York: ACM.

Black, J. B . , & Sebrechts, M. M. (1981). Facilitating human-computer communica

tion. Applied Psycholinguistics,

2, 149-177.

Borning, A. (1979). ThingLab: A constraint-oriented simulation laboratory (Tech. Rep. No.

SSL-79-3). Palo Alto, CA: Xerox Palo Alto Research Center.

338

HUTCHINS, HOLLAN, NORMAN

Budge, B. (1983). Pinball construction set [Computer program]. San Mateo, CA: Elec

tronic Arts.

Buxton, W. (1986). There’s more to interaction than meets the eye: Some issues in man ual input. In D. A. Norman & S. W. Draper (Eds.), Usercenteredsystem design; Newper spectives on human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.

Carrol, J. M. (1985). What’s in a name? A n essay in thepsychology Ofreference. New York:

Freeman.

diSessa, A. A. (1985). A principles design for an integrated computational environ

ment. Human-Computer Interaction, I, 1-47.

Draper, S. W. (1986). Display managers as the basis for user-machine communication. In D. A. Norman & S. W. Draper (Eds.), User centeredsystem design: New perspectives on human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Galton, F. (1894). Arithmetic by smell. Psychological Reuiew, I, 61-62. Hollan, J . D., Hutchins, E., & Weitzman, L. (1984). Steamer: An interactive

inspectable simulation-based training system. AZMagarine, 5, 15-27.

Hollan, J. D., Stevens, A., & Williams, M. D. (1980). Steamer: An advanced computer-assisted instruction system for propulsion engineering. Proceedings @Sum mer Computer Simulation Conference, 400-404. Arlington, VA: AFIPS Press.

Kay, A. (1984, September). Computer software. Scientific American, 52-59. Laurel, B. K. (1986). Interface as mimesis. In D. A. Norman & S. W. Draper (Eds.),

User centered system design: New perspectives on human-computer interaction. Hillsdale, NJ:

Lawrence Erlbaum Associates, Inc.

Minksy, M. R. (1984, July). Manipulating simulated objects with real-world gestures

using a force and position sensitive screen. Computer Graphics, 195-203.

Norman, D. A,, &Draper, S. W. (Eds.). (1986). Usercenteredrystemdesign: Newperspec tives on human-computer interaction. Hillsdale, NJ: Lawrence Erlbaum Associates, Inc. Perlis, A. J . (1982). Epigrams on programming. SZGPLANNotices, 17(9), 7-13. Shneiderman, B. (1974). A computer graphics system for polynomials. TheMathematics

Teacher, 67(2), 111-113.

Shneiderman, B. (1982). The future of interactive systems and the emergence ofdirect

manipulation. Behavior and Information Technology, I , 237-256.

Shneiderman, B. (1983). Direct manipulation: A step beyond programming lan

guages. IEEE Computer, 16(8), 57-69.

Sutherland, I. E. (1963). Sketchpad: A man-machine graphical communication sys tem. Proceedings Ofthe Spring Joint Computer Conference, 329-346. Baltimore, MD: Spar tan Books.

This is an invited paper based on a draft ofApril 1 , 1985. Fi nal manuscript received October 3 , 1985. -Editor

HCZEditorial Record.

Human-Computer

Interaction

An Empirical Research Perspective

I. Scott MacKenzie

The Human Factor

2

The deepest challenges in human-computer interaction (HCI) lie in the human fac tor. Humans are complicated. Computers, by comparison, are simple. Computers are designed and built and they function in rather strict terms according to their programmed capabilities. There is no parallel with humans. Human scientists (including those in HCI) confront something computer scientists rarely think about: variability. Humans differ. We’re young, old, female, male, experts, novices, left handed, right-handed, English-speaking, Chinese-speaking, from the north, from the south, tall, short, strong, weak, fast, slow, able-bodied, disabled, sighted, blind, motivated, lazy, creative, bland, tired, alert, and on and on. The variability humans bring to the table means our work is never precise. It is always approximate. Designing systems that work well, period, is a lofty goal, but unfortunately, it is not possible to the degree we would like to achieve. A system might work well for a subset of people, but venture to the edges along any dimension (see list above), and the system might work poorly, or not at all. It is for this reason that HCI designers have precepts like “know thy user” (Shneiderman and Plaisant, 2005, p. 66).

Researchers in HCI have questions—lots of them. We are good at the small ones, but the big ones are dif!cult to answer: Why do humans make mistakes? Why do humans forget how to do things? Why do humans get confused while installing apps on their computers? Why do humans have trouble driving while talking on a mobile phone? Why do humans enjoy Facebook so much? Obviously, the human part is hugely important and intriguing. The more we understand humans, the better are our chances of designing interactive systems—interactions—that work as intended. So in this chap ter I examine the human, but the computer and the interaction are never far away.

The questions in the preceding paragraph begin with why. They are big ques tions. Unfortunately, they do not lend themselves to empirical enquiry, which is the focus of this book. Take the !rst question: Why do humans make mistakes? From an empirical research perspective, the question is too broad. It cannot be answered with any precision. Our best bet is to narrow in on a de!ned group of humans (a population) and ask them to do a particular task on a particular system in a par ticular environment. We observe the interaction and measure the behavior. Along the way, we log the mistakes, classify them, count them, and take note of where and how the mistakes occurred. If our methodology is sound, we might assimilate enough information to put forth an answer to the why question—in a narrow sense.

Human-Computer Interaction.

© 2013 Elsevier Inc. All rights reserved.

27

If we do enough research like this, we might develop an answer in a broad sense. But a grounded and rigorous approach to empirical research requires small and nar rowly focused questions.

Descriptive models, which will be discussed in Chapter 7, seek to delineate and categorize a problem space. They are tools for thinking, rather than tools for pre dicting. A descriptive model for “the human” would be useful indeed. It would help us get started in understanding the human, to delineate and categorize aspects of the human that are relevant to HCI. In fact there are many such models, and I will introduce several in this chapter.

2.1 Time scale of human action

Newell’s Time Scale of Human Action is a descriptive model of the human (Newell, 1990, p. 122). It delineates the problem space by positioning different types of human actions in timeframes within which the actions occur. (See Figure 2.1.) The model has four bands, a biological band, a cognitive band, a rational band, and a social band. Each band is divided into three levels. Time is ordered by seconds and appears on a logarithmic scale, with each level a factor of ten longer than the level below it. The units are microseconds at the bottom and months at the top. For nine levels, Newell ascribes a label for the human system at work (e.g., operations or task). Within these labels we see a connection with HCI. The labels for the bands suggest a worldview or theory of human action.

The most common dependent variable in experimental research in HCI is time—the time for a user to do a task. In this sense, Newell’s time-scale model is relevant to HCI. The model is also appropriate because it re"ects the multidisci plinary nature of the !eld. HCI research is both high level and low level, and we see this in the model. If desired, we could select a paper at random from an HCI conference proceedings or journal, study it, then position the work somewhere in Figure 2.1. For example, research on selection techniques, menu design, force or auditory feedback, text entry, gestural input, and so on, is within the cognitive band. The tasks for these interactions typically last on the order of a few hundred milli seconds (ms) to a few dozen seconds. Newell characterizes these as deliberate acts, operations, and unit tasks.

Up in the rational band, users are engaged in tasks that span minutes, tens of minutes, or hours. Research topics here include web navigation, user search strate gies, user-centered design, collaborative computing, ubiquitous computing, social navigation, and situated awareness. Tasks related to these research areas occupy users for minutes or hours.

Tasks lasting days, weeks, or months are in the social band. HCI topics here might include workplace habits, groupware usage patterns, social networking, online dating, privacy, media spaces, user styles and preferences, design theory, and so on.

Another insight Newell’s model provides pertains to research methodology. Research at the bottom of the scale is highly quantitative in nature. Work in the bio logical band, for example, is likely experimental and empirical—at the level of neural

2.2 Human factors

Scale Time (sec) Units System World (theory)
107 Months 106 Weeks SOCIAL BAND
105 Days
104 Hours Task
103 10 min 102 Minutes Task Task RATIONAL BAND
101 10 sec Unit task
100 1 sec 10-1 100 ms Operations Deliberate act COGNITIVE BAND
10-2 10 ms 10-3 1 ms Neural circuit Neuron BIOLOGICAL BAND
10-4 100 µs Organelle

FIGURE 2.1

Newell’s time scale of human action.

(From Newell, 1990, p. 122)

impulses. At the top of the scale, the reverse is true. In the social band, research meth ods tend to be qualitative and non-experimental. Techniques researchers employ here include interviews, observation, case studies, scenarios, and so on. Furthermore, the transition between qualitative and quantitative methods moving from top to bottom in the !gure is gradual. As one methodology becomes more prominent, the other becomes less prominent. Researchers in the social band primarily use qualitative methods, but often include some quantitative methods. For example, research on workplace habits, while primarily qualitative, might include some quantitative methods (e.g., counting the number of personal e-mails sent each day while at work). Thus, qualitative research in the social band also includes some quantitative assessment. Conversely, researchers in the cognitive band primarily use quantitative methods but typically include some qualitative methods. For example, an experiment on human performance with pointing devices, while primarily quantitative, might include an interview at the end to gather comments and suggestions on the interactions. Thus, quantitative, experimental work in the cognitive band includes some qualitative assessment as well.

Newell speculates further on bands above the social band: a historical band operating at the level of years to thousands of years, and an evolutionary band oper ating at the level of tens of thousands to millions of years (Newell, 1990, p. 152). We will forgo interpreting these in terms of human-computer interaction.

2.2 Human factors

There are many ways to characterize the human in interactive systems. One is the model human processor of Card et al. (1983), which was introduced in Chapter 1.

FIGURE 2.2

Human factors view of the human operator in a work environment.

(After Kantowitz and Sorkin, 1983, p. 4)

Other characterizations exist as well. Human factors researchers often use a model showing a human operator confronting a machine, like the image in Figure 2.2. The human monitors the state of the computer through sensors and displays and controls the state of the computer through responders and controls. The dashed vertical line is important since it is at the interface where interaction takes place. This is the location where researchers observe and measure the behavioral events that form the interaction.

Figure 2.2 is a convenient way to organize this section, since it simpli!es the human to three components: sensors, responders, and a brain.

2.3 Sensors

Rosa: You deny everything except what you want to believe. That’s the sort of man you are. Bjartur: I have my !ve senses, and don’t see what need there is for more.

(Halldór Laxness, Independent People)

The !ve classical human senses are vision, hearing, taste, smell, and touch. Each brings distinctly different physical properties of the environment to the human. One feature the senses share is the reception and conversion into electri cal nerve signals of physical phenomena such as sound waves, light rays, "avors, odors, and physical contact. The signals are transmitted to the brain for processing. Sensory stimuli and sense organs are purely physiological. Perception, discussed later, includes both the sensing of stimuli and use of the brain to develop identi!ca tion, awareness, and understanding of what is being sensed. We begin with the !rst of the !ve senses just noted: vision.

2.3.1 Vision (Sight)

Vision, or sight, is the human ability to receive information from the environ ment in the form of visible light perceived by the eye. The visual sensory channel

FIGURE 2.3

The eye.

FIGURE 2.4

The fovea image spans a region a little more than one degree of visual angle.

is hugely important, as most people obtain about 80 percent of their information though the sense of light (Asakawa and Takagi, 2007). The act of seeing begins with the reception of light through the eye’s lens. The lens focuses the light into an image projected on to the retina at the back of the eye. (See Figure 2.3.) The retina is a transducer, converting visible light into neurological signals sent to the brain via the optic nerve.

Near the center of the retina is the fovea, which is responsible for sharp central vision, such as reading or watching television. The fovea image in the environment encompasses a little more than one degree of visual angle, approximately equiv alent to the width of one’s thumb at arm’s length (see Figure 2.4). Although the fovea is only about 1 percent of the retina in size, the neural processing associated with the fovea image engages about 50 percent of the visual cortex in the brain.

As with other sensory stimuli, light has properties such as intensity and frequency. Frequency. Frequency is the property of light leading to the perception of color. Visible light is a small band in the electromagnetic spectrum, which ranges from

FIGURE 2.5

The visible spectrum of electromagnetic waves.

radio waves to x-rays and gamma rays. Different colors are positioned within the visible spectrum of electromagnetic waves, with violet at one end (390 nanome ters) and red at the other (750 nm). (See Figure 2.5; colors not apparent in grayscale print).

Intensity. Although the frequency of light is a relatively simple concept, the same cannot be said for the intensity of light. Quantifying light intensity, from the human perspective, is complicated because the eye’s light sensitivity varies by the wavelength of the light and also by the complexity of the source (e.g., a sin gle frequency versus a mixture of frequencies). Related to intensity is luminance, which refers to the amount of light passing through a given area. With luminance comes brightness, a subjective property of the eye that includes perception by the brain. The unit for luminance is candela per square meter (cd/m2).

Fixations and saccades. Vision is more than the human reception of electro magnetic waves having frequency and intensity. Through the eyes, humans look at and perceive the environment. In doing so, the eyes engage in two primitive actions: !xations and saccades. During a !xation, the eyes are stationary, taking in visual detail from the environment. Fixations can be long or short, but typically last at least 200 ms. Changing the point of !xation to a new location requires a saccade— a rapid repositioning of the eyes to a new position. Saccades are inherently quick, taking only 30–120 ms. Early and in"uential research on !xations and saccades was presented in a 1965 publication in Russian by Yarbus, translated as Eye Movements and Vision (reviewed in Tatler, Wade, Kwan, Findlay, and Velichkovsky, 2010). Yarbus demonstrated a variety of inspection patterns for people viewing scenes. One example used The Unexpected Visitor by painter Ilya Repin (1844–1930). Participants were given instructions and asked to view the scene, shown in Figure 2.6a. Eye movements (!xations and saccades) were recorded and plotted for a variety of tasks. The results for one participant are shown in Figure 2.6b for the task “remember the position of people and objects in the room” and in Figure 2.6c for the task “estimate the ages of the people.” Yarbus provided many diagrams like this, with analyses demonstrating differences within and between participants, as well as changes in viewing patterns over time and for subsequent viewings. He noted, for example, that the similarity of inspection patterns for a single viewer was greater than the patterns between viewers.

HCI research in eye movements has several themes. One is analyzing how people read and view content on web pages. Figure 2.7 shows an example of a

FIGURE 2.6

Yarbus’ research on eye movements and vision (Tatler et al., 2010). (a) Scene. (b) Task:

Remember the position of the people and objects in the room. (c) Task: Estimate the ages of the people.

scanpath (a sequence of !xations and saccades) for a user viewing content at dif ferent places on a page. (See also J. H. Goldberg and Helfman, 2010, Figure 2.) The results of the analyses offer implications for page design. For example, advertis ers might want to know about viewing patterns and, for example, how males and females differ in viewing content. There are gender differences in eye movements

FIGURE 2.7

Scanpath for a user locating content on a web page.

(Pan et al., 2004), but it remains to be demonstrated how low-level experimental results can inform and guide design.

2.3.2 Hearing (Audition)

Hearing, or audition, is the detection of sound by humans. Sound is transmit ted through the environment as sound waves—cyclic "uctuations of pressure in a medium such as air. Sound waves are created when physical objects are moved or vibrated, thus creating "uctuations in air pressure. Examples include plucking a string on a guitar, slamming a door, shuf"ing cards, or a human speaking. In the latter case, the physical object creating the sound is the larynx, or vocal cords, in the throat.

Hearing occurs when sound waves reach a human’s ear and stimulate the ear drum to create nerve impulses that are sent to the brain. A single sound from a single source has at least four physical properties: intensity (loudness), frequency (pitch), timbre, and envelope. As a simple example, consider a musical note played from an instrument such as a trumpet. The note may be loud or soft (intensity); high or low (frequency). We hear and recognize the note as coming from a trumpet, as opposed to a "ute, because of the note’s timbre and envelope. Let’s examine each of these properties.

Loudness. Loudness is the subjective analog to the physical property of inten sity. It is quanti!ed by sound pressure level, which expresses the pressure in a sound wave relative to the average pressure in the medium. The unit of sound pressure level is the decibel (dB). Human hearing begins with sounds of 0–10 dB. Conversational speech is about 50–70 dB in volume. Pain sets in when humans are exposed to sounds of approximately 120–140 dB.

Pitch. Pitch is the subjective analog of frequency, which is the reciprocal of the time between peaks in a sound wave’s pressure pattern. The units of pitch are cycles per second, or Hertz (Hz). Humans can perceive sounds in the frequency range of about 20 Hz–20,000 Hz (20 kHz), although the upper limit tends to decrease with age.

Timbre. Timbre (aka richness or brightness) results from the harmonic struc ture of sounds. Returning to the example of a musical note, harmonics are integer multiples of a note’s base frequency. For example, a musical note with base fre quency of 400 Hz includes harmonics at 800 Hz, 1200 Hz, 1600 Hz, and so on. The relative amplitudes of the harmonics create the subjective sense of timbre, or rich ness, in the sound. While the human hears the note as 400 Hz, it is the timbre that distinguishes the tone as being from a particular musical instrument. For example, if notes of the same frequency and loudness are played from a trumpet and an oboe, the two notes sound different, in part, because of the unique pattern of harmon ics created by each instrument. The purest form of a note is a sine wave, which includes the base frequency but no harmonics above the base frequency. The musi cal notes created by a "ute are close to sine waves.

Envelope. Envelope is the way a note and its harmonics build up and transition in time—from silent to audible to silent. There is considerable information in the onset envelope, or attack, of musical notes. In the example above of the trumpet and oboe playing notes of the same frequency and same loudness, the attack also assists in distinguishing the source. If the trumpet note and oboe note were recorded and played back with the attack removed, it would be surprisingly dif!cult to distin guish the instruments. The attack results partly from inherent properties of instru ments (e.g., brass versus woodwind), but also from the way notes are articulated (e.g., staccato versus legato).

Besides physical properties, sound has other properties. These have to do with human hearing and perception. Sounds, complex sounds, can be described as being harmonious (pleasant) or discordant (unpleasant). This property has to do with how different frequencies mix together in a complex sound, such as a musical chord. Sounds may also convey a sense of urgency or speed.

Humans have two ears, but each sound has a single source. The slight differ ence in the physical properties of the sound as it arrives at each ear helps humans in identifying a sound’s location (direction and distance). When multiple sounds from multiple sources are heard through two ears, perceptual effects such as stereo emerge.

Sounds provide a surprisingly rich array of cues to humans, whether walking about while shopping or sitting in front of a computer typing an e-mail message. Not surprisingly, sound is crucial for blind users, for example, in conveying infor mation about the location and distance of environmental phenomena (Talbot and Cowan, 2009).

2.3.3 Touch (Tactition)

Although touch, or tactition, is considered one of the !ve traditional human senses, touch is just one component of the somatosensory system. This system includes sensory receptors in the skin, muscles, bones, joints, and organs that provide infor mation on a variety of physical or environmental phenomena, including touch, temperature, pain, and body and limb position. Tactile feedback, in HCI, refers to information provided through the somatosensory system from a body part, such as a !nger, when it is in contact with (touching) a physical object. Additional infor mation, such as the temperature, shape, texture, or position of the object, or the amount of resistance, is also conveyed.

All user interfaces that involve physical contact with the user’s hands (or other body parts) include tactile feedback. Simply grasping a mouse and moving it brings considerable information to the human operator: the smooth or rubbery feel of the mouse chassis, slippery or sticky movement on the desktop. Interaction with a desktop keyboard is also guided by tactile feedback. The user senses the edges and shapes of keys and experiences resistance as a key is pressed. Tactile identi!ers on key tops facilitate eyes-free touch typing. Identi!ers are found on the 5 key for numeric keypads and on the F and J keys for alphanumeric keyboards. Sensing the identi!er informs the user that the home position is acquired. (See Figure 2.8a.)

Augmenting the user experience through active tactile feedback is a common research topic. Figure 2.8b shows a mouse instrumented with a solenoid-driven pin below the index !nger (Akamatsu et al., 1995). The pin is actuated (pulsed) when the mouse cursor crosses a boundary, such as the edge of a soft button or window. The added tactile feedback helps inform and guide the interaction and potentially reduces the demand on the visual channel. A common use of tactile feedback in mobile phones is vibration, signaling an incoming call or message. (See Figure 2.8c.)

2.3.4 Smell and taste

Smell (olfaction) is the ability to perceive odors. For humans, this occurs through sensory cells in the nasal cavity. Taste (gustation) is a direct chemical reception of sweet, salty, bitter, and sour sensations through taste buds in the tongue and oral cav ity. Flavor is a perceptual process in the brain that occurs through a partnering of the

FIGURE 2.8

Tactile feedback: (a) Identifier on key top. (b) Solenoid-driven pin under the index finger. (c) Vibration signals an in-coming call.

(Adapted from Akamatsu, MacKenzie, and Hasbrouq, 1995)

smell and taste senses. Although smell and taste are known intuitively by virtually all humans—and with expert-like !nesse—they are less understood than the visual and auditory senses. Complex smells and tastes can be built up from simpler ele ments, but the perceptual processes for this remain a topic of research. For example, classi!cation schemes have been developed for speci!c industries (e.g., perfume, wine) but these do not generalize to human experiences with other smells and tastes.

While humans use smell and taste all the time without effort, these senses are not generally “designed in” to systems. There are a few examples in HCI. Brewster et al. (2006) studied smell as an aid in searching digital photo albums. Users employed two tagging methods, text and smell, and then later used the tags to answer ques tions about the photos. Since smell has links to memory, it was conjectured that smell cues might aid in recall. In the end, recall with smell tags was poorer than with word tags. Related work is reported by Bodnar et al. (2004) who compared smell, auditory, and visual modalities for notifying users of an interruption by an incoming message. They also found poorer performance with smell. Notable in both examples, though, is the use of an empirical research methodology to explore the potential of smell in a user interface. Both studies included all the hallmarks of experimental research, including an independent variable, dependent variables, sta tistical signi!cance testing, and counterbalancing of the independent variable.

2.3.5 Other senses

The word sense appears in many contexts apart from the !ve senses discussed above. We often hear of a sense of urgency, a sense of direction, musical sense, intuitive sense, moral sense, or even common sense. The value of these and related senses to HCI cannot be overstated. Although clearly operating at a higher level than the !ve primary senses, these additional senses encapsulate how humans feel about their interactions with computers. Satisfaction, con!dence, frustration, and so on, are clearly steeped in how users feel about computing experiences. Are there receptors that pick up these senses, like cells in the naval cavity? Perhaps. It has been argued and supported with experimental evidence that humans may have a moral sense that is like our sense of taste (Greene and Haidt, 2002). We have natu ral receptors that help us pick up sweetness and saltiness. In the same way, we may have natural receptors that help us recognize fairness and cruelty. Just as a few uni versal tastes can grow into many different cuisines, a few moral senses can grow into different moral cultures.

2.4 Responders

Through movement, or motor control, humans are empowered to affect the environ ment around them. Control occurs through responders. Whether using a !nger to text1 or point, the feet to walk or run, the eyebrows to frown, the vocal chords to speak, or the torso to lean, movement provides humans with the power to engage and affect the world around them. Pen!eld’s motor homunculus is a classic illus tration of human responders (Pen!eld and Rasmussen, 1990). (See Figure 2.9.) The illustration maps areas in the cerebral motor cortex to human responders. The lengths of the underlying solid bars show the relative amount of cortical area devoted to each muscle group. As the bars reveal, the muscles controlling the hand and !ngers are highly represented compared to the muscles responsible for the wrist, elbow, and shoulders. Based partially on this information, Card et al. (1991) hypothesized that “those groups of muscles having a large area devoted to them are heuristically promising places to connect with input device transducers if we desire high performance,” although they rightly caution that “the determinants of muscle performance are more complex than just simple cortical area” (Card et al., 1991, p. 111). (See also Balakrishnan and MacKenzie, 1997).

See also student exercise 2-1 at the end of this chapter.

2.4.1 Limbs

Human control over machines is usually associated with the limbs, particularly the upper body limbs. The same is true in HCI. With !ngers, hands, and arms we 1 “Text” is now an accepted verb in English. “I’ll text you after work,” although strange in the 1980s, is understood today as sending a text message (SMS) on a mobile phone.

FIGURE 2.9

Motor homunculus showing human responders and the corresponding cortical area.

(Adapted from Penfield and Rasmussen, 1990)

type on keyboards, maneuver mice and press buttons, hold mobile phones and press keys, touch and swipe the surfaces of touchscreen phones and tablets, and wave game controllers in front of displays. Of course, legs and feet can also act as responders and provide input to a computer. For users with limited or no use of their arms, movement of the head can control an on-screen cursor. Some example scenarios are seen in Figure 2.10.

Movement of the limbs is tightly coupled to the somatosensory system, par ticularly proprioception (Proprioception is the coordination of limb movement and position through the perception of stimuli within muscles and tendons.), to achieve accuracy and !nesse as body parts move relative to the body as a whole. Grasping a mouse without looking at it and typing without looking at the keyboard are examples.

In Figure 2.10a, the user’s left hand grips the mouse. Presumably this user is left-handed. In Figure 2.10b, the user’s right index !nger engages the surface of the touchpad. Presumably, this user is right-handed. Interestingly enough, handed ness, or hand dominance, is not an either-or condition. Although 8 to 15 percent of people are deemed left-handed, handedness exists along a continuum, with people considered, by degree, left-handed or right-handed. Ambidextrous people are sub stantially indifferent in hand preference.

FIGURE 2.10

Use of the limbs in HCI: (a) Hands. (b) Fingers. (c) Thumbs. (d) Arms. (e) Feet. (f) Head.

(sketches a and d courtesy of Shawn Zhang; e, adapted from Pearson and Weiser, 1986)

A widely used tool to assess handedness is the Edinburgh Handedness Inventory, dating to 1971 (Old!eld, 1971). The inventory is a series of self-assess ments of the degree of preference one feels toward the left or right hand in doing common tasks, such as throwing a ball. The inventory is shown in Figure 2.11

FIGURE 2.11

Instructions

Mark boxes as follows:

x xx

preference strong preference

blank no preference Scoring Add up the number of checks in the “Left” and “Right” columns and enter in the “Total” row for each column. Add the left total and the right total and enter in the “Cumulative Total” cell. Subtract the left total from the right total and enter in the “Difference” cell. Divide the “Difference” cell by the “Cumulative Total” cell (round to 2 digits if necessary) and multiply by 100. Enter the result in the “RESULT” cell. Interpretation of RESULT

–100 to –40 left-handed

–40 to +40 ambidextrous +40 to 100 right-handed

Edinburgh Handedness Inventory for hand dominance assessment (Oldfield, 1971).

along with the instructions, scoring, and interpretation of results. 2 People scoring −100 to −40 are considered left-handed, whereas those scoring +40 to +100 are considered right-handed. People scoring −40 to +40 are considered ambidextrous.

There are several examples in HCI where the Edinburgh Handedness Inventory was administered to participants in experiments (Hancock and Booth, 2004; Hegel, Krach, Kircher, Wrede, and Sagerer, 2008; Kabbash, MacKenzie, and Buxton, 1993; Mappus, Venkatesh, Shastry, Israeli, and Jackson, 2009; Masliah and Milgram, 2000; Matias, MacKenzie, and Buxton, 1996). In some cases, the degree of handed ness is reported. For example, Hinckley et al. (1997) reported that all participants in their study were “strongly right-handed,” with a mean score of 71.7 on the inventory.

Handedness is often relevant in situations involving touch- or pressure-sensing displays. If interaction requires a stylus or !nger on a display, then the user’s hand may occlude a portion of the display. Occlusion may lead to poorer performance (Forlines and Balakrishnan, 2008) or to a “hook posture” where users contort the arm position to facilitate interaction (Vogel and Balakrishnan, 2010). This can be avoided by positioning UI elements in a different region on the display (Hancock and Booth, 2004; Vogel and Baudisch, 2007). Of course, this requires sensing or determining the handedness of the user, since the occlusion is different for a left handed user than for a right-handed user.

2 Dragovic (2004) presents an updated version of the Edinburgh Inventory, using more contemporary and widely-understood tasks.

2.4.2 Voice

The human vocal cords are responders. Through the combination of movement in the larynx, or voice box, and pulmonary pressure in the lungs, humans can create a great variety of sounds. The most obvious form of vocalized sound—speech—is the primary channel for human communication. As an input modality, the speech must be recognized by algorithms implemented in software running on the host computer. With this modality, the computer interprets spoken words as though the same words were typed on the system’s keyboard. Vertanen and Kristensson (2009) describe a system for mobile text entry using automatic speech recognition. They report entry rates of 18 words per minute while seated and 13 words per minute while walking.

Computer input is also possible using non-speech vocalized sounds, a modal ity known as non-verbal voice interaction (NVVI). In this case, various acoustic parameters of the sound signal, such as pitch, volume, or timbre, are measured over time and the data stream is interpreted as an input channel. The technique is par ticularly useful to specify analog parameters. For example, a user could produce an utterance, such as “volume up, aaah.” In response, the system increases the volume of the television set for as long as the user sustains “aaah” (Igarashi and Hughes, 2001). Harada et al. (2006) describe the vocal joystick—a system using NVVI to simulate a joystick and control an on-screen cursor (e.g., “eee” = move cursor left). Applications are useful primarily for creating accessible computing for users with out a manual alternative.

2.4.3 Eyes

In the normal course of events, the human eye receives sensory stimuli in the form of light from the environment. In viewing a scene, the eyes combine !xations, to view particular locations, and saccades, to move to different locations. This was noted earlier in considering the eye as a sensory organ. However, the eye is also capable of acting as a responder—controlling a computer through !xations and sac cades. In this capacity, the eye is called upon to do double duty since it acts both as a sensor and as a responder. The idea is illustrated in Figure 2.12, which shows a modi!ed view of the human-computer interface (see Figure 2.2 for comparison). The normal path from the human to the computer is altered. Instead of the hand providing motor responses to control the computer through physical devices (set in grey), the eye provides motor responses that control the computer through soft con trols—virtual or graphical controls that appear on the system’s display.

For computer input control using the eyes, an eye tracking apparatus is required to sense and digitize the gaze location and the movement of the eyes. The eye tracker is usually con!gured to emulate a computer mouse. Much like point-select operations with a mouse, the eye can look-select, and thereby activate soft controls such as buttons, icons, links, or text (e.g., Zhang and MacKenzie, 2007). The most common method for selecting with the eye is by !xating, or dwelling, on a selecta ble target for a predetermined period of time, such as 750 ms.

The human-computer interface with an eye tracker. The eye serves double duty, processing sensory stimuli from computer displays and providing motor responses to control the system.

FIGURE 2.13

Eye typing: (a) Apparatus. (b) Example sequence of fixations and saccades (Majaranta et al., 2006).

Text entry is one application of eye tracking for input control. So-called eye typing uses an on-screen keyboard. The user looks at soft keys, !xating for a pre scribed dwell time to make a selection. An example setup using an iView X RED-III eye tracking device by SensoMotoric Instruments (www.smivision.com) is shown in Figure 2.13a. Figure 2.13b shows a sequence of !xations and saccades (a scan path) for one user while entering a phrase of text (Majaranta, MacKenzie, Aula, and Räiha, 2006). Straight lines indicate saccades. Circles indicate !xations, with the diameter indicating the duration of the !xation. Bear in mind that the !xations here are conscious, deliberate acts for controlling a computer interface. This is different from the !xations shown in Figure 2.7, where the user was simply viewing con tent on a web page. In Figure 2.13b, the interaction includes numerous !xations meeting the required dwell time criterion to select soft keys. There is also a !xation (with two corresponding saccades) to view the typed text.

2.5 The brain

The brain is the most complex biological structure known. With billions of neurons, the brain provides humans with a multitude of capacities and resources, including pondering, remembering, recalling, reasoning, deciding, and communicating. While sensors (human inputs) and responders (human outputs) are nicely mirrored, it is the brain that connects them. Without sensing or experiencing the environment, the brain would have little to do. However, upon experiencing the environment through sensors, the brain’s task begins.

2.5.1 Perception

Perception, the !rst stage of processing in the brain, occurs when sensory signals are received as input from the environment. It is at the perceptual stage that asso ciations and meanings take shape. An auditory stimulus is perceived as harmoni ous or discordant. A smell is pleasurable or abhorrent. A visual scene is familiar or strange. Touch something and the surface is smooth or rough, hot or cold. With associations and meaning attached to sensory input, humans are vastly superior to the machines they interact with:

People excel at perception, at creativity, at the ability to go beyond the informa tion given, making sense of otherwise chaotic events. We often have to interpret events far beyond the information available, and our ability to do this ef!ciently and effortlessly, usually without even being aware that we are doing so, greatly adds to our ability to function.

(Norman, 1988, p. 136)

Since the late 19th century, perception has been studied in a specialized area of experimental psychology known as psychophysics. Psychophysics examines the relationship between human perception and physical phenomena. In a psy chophysics experiment, a human is presented with a physical stimulus and is then asked about the sensation that was felt or perceived. The link is between a measur able property of a real-world phenomenon that stimulates a human sense and the human’s subjective interpretation of the phenomenon. A common experimental goal is to measure the just noticeable difference (JND) in a stimulus. A human subject is presented with two stimuli, one after the other. The stimuli differ in a physical prop erty, such as frequency or intensity, and the subject is asked if the stimuli are the same or different. The task is repeated over a series of trials with random variations in the magnitude of the difference in the physical property manipulated. Below a

Ambiguous images: (a) Necker cube. (b) Rubin vase.

certain threshold, the difference between the two stimuli is so small that it is not perceived by the subject. This threshold is the JND. JND has been highly researched for all the human senses and in a variety of contexts. Does the JND depend on the absolute magnitude of the stimuli (e.g., high intensity stimuli versus low intensity stimuli)? Does the JND on one property (e.g., intensity) depend on the absolute value of a second property (e.g., frequency)? Does the JND depend on age, gen der, or other property of the human? These are basic research questions that, on the surface, seem far a!eld from the sort of research likely to bear on human-computer interfaces. But over time and with new research extending results from previous research, there is indeed an application to HCI. For example, basic research in psy chophysics is used in algorithms for audio compression in MP3 audio encoding.

Another property of perception is ambiguity—the human ability to develop multiple interpretations of a sensory input. Ambiguous images provide a demon stration of this ability for the visual sense. Figure 2.14a shows the Necker wire frame cube. Is the top-right corner on the front surface or the back surface? Figure 2.14b shows the Rubin vase. Is the image a vase or two faces? The very fact that we sense ambiguity in these images reveals our perceptual ability to go beyond the information given.

Related to ambiguity is illusion, the deception of common sense. Figure 2.15a shows Ponzo lines. The two black lines are the same length; however, the black line near the bottom of the illustration appears shorter because of the three-dimen sional perspective. Müller-Lyer arrows are shown in Figure 2.15b. In comparing the straight-line segments in the two arrows, the one in the top arrow appears longer when in fact both are the same length. Our intuition has betrayed us.

If illusions are possible in visual stimuli, it is reasonable to expect illusions in the other senses. An example of an auditory illusion is the Shepard musical scale. It is perceived by humans to rise or fall continuously, yet it somehow stays the same. A variation is a continuous musical tone known as the Shepard-Risset glis sando—a tone that continually rises in pitch while also continuing to stay at the same pitch. Figure 2.16 illustrates this illusion. Each vertical line represents a sine

FIGURE 2.15

Visual illusion: (a) Ponzo lines. (b) Müller-Lyer arrows.

FIGURE 2.16

Auditory illusion. A collection of equally spaced sine waves rise in frequency. The human hears a tone that rises but stays the same.

wave. The height of each line is the perceived loudness of the sine wave. Each wave is displaced from its neighbor by the same frequency; thus, the waves are harmon ics of a musical note with a base frequency equal to the displacement. This is the frequency of the single tone that a human perceives. If the sine waves collectively rise in frequency (block arrows in the !gure), there is a sense that the tone is rising. Yet because the sine waves are equally spaced, there is a competing sense that the tone remains the same (because the frequency perceived is the distance between harmonics). Sine waves at the high end of the frequency distribution fade out, while new sine waves enter at the low end. Examples of the Shepard scale and the Shepard-Risset glissando can be heard on YouTube.

Tactile or haptic illusions also exist. A well-documented example is the “phan tom limb.” Humans who have lost a limb through amputation often continue to sense that the limb is present and that it moves along with other body parts as it did before amputation (Halligan, Zemen, and Berger, 1999).

Beyond perception, sensory stimuli are integrated into a myriad of other experi ences to yield ideas, decisions, strategies, actions, and so on. The ability to excel at these higher-level capabilities is what propels humans to the top tier in classi!ca tion schemes for living organisms. By and large it is the human ability to think and reason that affords this special position.

(a)

(b)

FIGURE 2.17

Cognitive operation in a reaction time task: (a) Problem schematic. (b) Sequence of operations (Bailey, 1996, p. 41).

2.5.2 Cognition

Among the brain’s vital faculties is cognition—the human process of conscious intellectual activity, such as thinking, reasoning, and deciding. Cognition spans many !elds—from neurology to linguistics to anthropology—and, not surprisingly, there are competing views on the scope of cognition. Does cognition include social processes, or is it more narrowly concerned with deliberate goal-driven acts such as problem solving? It is beyond the reach of this book to unravel the many views of cognition. The task is altogether too great and in any case is aptly done in other references, many of them in human factors (e.g., B. H. Kantowitz and Sorkin, 1983; Salvendy, 1987; Wickens, 1987).

Sensory phenomena such as sound and light are easy to study because they exist in the physical world. Instruments abound for recording and measuring the pres ence and magnitude of sensory signals. Cognition occurs within the human brain, so studying cognition presents special challenges. For example, it is not possible to directly measure the time it takes for a human to make a decision. When does the measurement begin and end? Where is it measured? On what input is the human deciding? Through what output is the decision conveyed? The latter two questions speak to a sensory stimulus and a motor response that bracket the cognitive oper ation. Figure 2.17a illustrates this. Since sensory stimuli and motor responses are observable and measurable, the !gure conveys, in a rough sense, how to measure a cognitive operation. Still, there are challenges. If the sensory stimulus is visual, the retina converts the light to neural impulses that are transmitted to the brain for perceptual processing. This takes time. So the beginning of the cognitive operation is not precisely known. Similarly, if the motor response involves a !nger pressing a button, neural associations for the response are developed in the brain with nerve signals transmitted to the hand before movement begins. So the precise ending of the cognitive operation is also unknown. This sequence of events is shown in Figure 2.17b, noting the operations and the typical time for each step. The most remarkable observation here is the wide range of values—an indication of the dif!culty in pin pointing where and how the measurements are made. Despite these challenges, tech niques exist for measuring the duration of cognitive operations. These are discussed shortly.

The range of cognitive operations applicable to Figure 2.17 is substantial. While driving a car, the decision to depress a brake pedal in response to a changing sig nal light is simple enough. Similar scenarios abound in HCI. While using a mobile phone, one might decide to press the reject call key in response to an incom ing call. While reading the morning news online, one might decide to click the close button on a popup ad. While editing a document, one might switch to e-mail in response to an audio alert of a new message. These examples involve a sensory stimulus, a cognitive operation, and a motor response, respectively.

Other decisions are more complicated. While playing the card game 21 (aka Blackjack), perhaps online3 , if a card is drawn and the hand then totals 16, the deci sion to draw another card is likely to produce a cognitive pause. What is the chance the next card will bring the hand above 21? Which cards 6 to king are already dealt? Clearly, the decision in this scenario goes beyond the information in the sensory stimulus. There are strategies to consider, as well as the human ability to remember and recall past events—cards previously dealt. This ability leads us to another major function of the brain—memory.

2.5.3 Memory

Memory is the human ability to store, retain, and recall information. The capac ity of our memory is remarkable. Experiences, whether from a few days ago or from decades past, are collected together in the brain’s vast repository known as long-term memory. Interestingly enough, there are similarities between memory in the brain and memory in a computer. Computer memory often includes separate areas for data and code. In the brain, memory is similarly organized. A declara tive/explicit area stores information about events in time and objects in the exter nal world. This is similar to a data space. An implicit/procedural area in the brain’s memory stores information about how to use objects or how to do things. This is similar to a code space. 4

Within long-term memory is an active area for short-term memory or work ing memory. The contents of working memory are active and readily available for access. The amount of such memory is small, about seven units, depending on the task and the methodology for measurement. A study of short-term memory was 3 The parenthetic “perhaps online” is included as a reminder that many activities humans do in the physical world have a counterpart in computing, often on the Internet.

4 The reader is asked to take a cautious and loose view of the analogy between human memory and com puter memory. Attempts to formulate analogies from computers to humans are fraught with problems. Cognitive scientists, for example, frequently speak of human cognition in terms of operators, operands, cycles, registers, and the like, and build and test models that !t their analogies. Such reverse anthropo morphism, while tempting and convenient, is unlikely to re"ect the true inner workings of human biology.

2.5 The brain

FIGURE 2.18

Results of a test of short-term memory.

published in 1956 in a classic essay by Miller, aptly titled “The Magic Number Seven, Plus or Minus Two: Some Limits on our Capacity for Processing Information” (G. A. Miller, 1956).5 Miller reviewed a large number of studies on the absolute judgment of stimuli, such as pitch in an auditory stimulus or salt con centration in water in a taste stimulus. Humans are typically able to distinguish about seven levels of a uni-dimensional stimulus. 6

Miller extended this work to human memory, describing an experiment where participants were presented with a sequence of items and then asked to recall the items. He found that the human ability with such tasks is, similarly, about seven items (±2). A simple demonstration of Miller’s thesis is shown in Figure 2.18. For this “mini-experiment,” log sheets were distributed to students in a class on human computer interaction (n ≈ 60). The instructor dictated sequences of random digits, with sequences varying in length from four digits to 13 digits. After each dictation, students copied the sequence from short-term memory onto the log sheet. The per centage of correct responses by sequence length is shown in the !gure. At length seven the number of correct responses was about 50 percent. At lengths !ve and nine the values were about 90 percent and 20 percent, respectively. 7 See also stu dent exercise 2-2 at the end of this chapter.

5 Miller’s classic work is referred to as an essay rather than a research paper. The essay is casual in style and, consequently, written in the !rst person; for example, “I am simply pointing to the obvious fact that…” (G. A. Miller, 1956, p. 93). Research papers, on the other hand, are generally plain in style and avoid !rst-person narratives (cf. “This points to the fact that…”). 6 The human ability to distinguish levels is greater if the stimulus is multidimensional; that is, the stim ulus contains two or more independent attributes, such as a sound that varies in pitch and intensity. 7 A response was deemed correct only if all the items were correctly recalled. For the longer sequences, many responses were “mostly correct.” For example, at sequence length = 7, many of the responses had !ve or six items correct.

Miller extended his work by revealing and analyzing a simple but powerful process within the brain: our ability to associate multiple items as one. So-called chunking is a process whereby humans group a series of low-level items into a single high-level item. He described an example using binary digits. For exam ple, a series of 16 bits, such as 1000101101110010, would be extremely dif !cult to commit to memory. If, however, the bits are collected into groups of four and chunked into decimal digits, the pattern is much easier to remember: 1000101101110010→1000, 1011, 0111, 0010→8, 11, 7, 2. Card et al. (1983, 36) give the example of BSCBMICRA. At nine units, the letter sequence is beyond the ability of most people to repeat back. But the sequence is similar to the follow ing three groups of three-letter sequences: CBS IBM RCA. Shown like this, the sequence contains three chunks and is relatively easy to remember provided the person can perform the recoding rapidly enough. The process of chunking is mostly informal and unstructured. Humans intuitively build up chunked structures recur sively and hierarchically, leading to complex organizations of memory in the brain.

2.6 Language

Language—the mental faculty that allows humans to communicate—is universally available to virtually all humans. Remarkably, language as speech is available with out effort. Children learn to speak and understand speech without conscious effort as they grow and develop. Writing, as a codi!cation of language, is a much more recent phenomenon. Learning to write demands effort, considerable effort, span ning years of study and practice. Daniels and Bright distinguish language and writ ing as follows: “Humankind is de!ned by language; but civilization is de!ned by writing.” (Daniels and Bright, 1996, p. 1). These words are a reminder that the cul tural and technological status associated with civilization is enabled by systems of writing. Indeed, the term prehistory, as applied to humans, dates from the arrival of human-like beings, millions of years ago, to the emergence of writing. It is writing that presaged recorded history, beginning a mere six thousand years ago.

In HCI, our interest in language is primarily in systems of writing and in the technology that enables communication in a written form. Text is the written mate rial on a page or display. How it gets there is a topic that intrigues and challenges HCI researchers, as well as the engineers and designers who create products that support text creation, or text entry. Although text entry is hugely important in HCI, our interest here is language itself in a written form.

One way to characterize and study a language in its written form is through a corpus—a large collection of text samples gathered from diverse and representa tive sources such as newspapers, books, e-mails, and magazines. Of course, it is not possible for a corpus to broadly yet precisely represent a language. The sam pling process brings limitations: During what timeframe were the samples written? In what country? In what region of the country? On what topics are the samples focused and who wrote them? A well-known corpus is the British National Corpus

Word Rank English French German Finnish SMS English SMS Pinyin
1 2 3 4 5 the of and a in de la et le à der die und in den ja on ei että oli u i to me at ) wo ( ni ( ) ) le ( de ( ) ) bu (
1000 1001 1002 1003 top truth balance heard ceci mari solution expliquer konkurrenz stieg notwendig sogenannte muista paikalla varaa vie ps quit rice sailing jiu ( ) ) tie ( ) ji ( jiao ( )
1004 speech pluie fahren seuran sale ) ku (

FIGURE 2.19

Sample words from word-frequency lists in various languages.

(BNC), which includes samples totaling 100 million words. 8 The sources are writ ten in British English and are from the late 20th century. So analyses gleaned from the BNC, while generally applicable to English, may not precisely apply, for exam ple, to American English, to present day English, or to the language of teenagers sending text messages.

To facilitate study and analysis, a corpus is sometimes reduced to a word-fre quency list, which tabulates unique words and their frequencies in the corpus. One such reduction of the BNC includes about 64,000 unique words with frequencies totaling 90 million (Silfverberg, MacKenzie, and Korhonen, 2000). Only words occurring three or more times in the original corpus are included. The most fre quent word is the, representing about 6.8 percent of all words.

Figure 2.19 includes excerpts from several corpora, showing the !ve most fre quently used words and the words ranked from 1000 to 1004. The English entries are from the British National Corpus. There are additional columns for French (New, Pallier, Brysbaert, and Ferrand, 2004), German (Sporka et al., 2011), Finnish, SMS English, and SMS Pinyin (Y. Liu and Räihä, 2010). The Finnish entries are from a database of text from a popular newspaper in Finland, Turun Sanomat. The SMS English entries are from a collection of about 10,000 text messages, mostly from students at the University of Singapore. 9 SMS text messaging is a good exam ple of the dynamic and context-sensitive nature of language. Efforts to characterize SMS English are prone to the limitations noted above. Note that there is no overlap in the entries 1–5 under English and SMS English.

The right-hand column in Figure 2.19 is for SMS Pinyin. Pinyin has been the standard coding system since 1958, using the Latin alphabet for Mandarin Chinese characters. The entries are pinyin marks, not words. Each mark maps to the Chinese

8 See www.natcorp.ox.ac.uk. 9 Available at www.comp.nus.edu.sg/~rpnlpir/smsCorpus.

(a)

(b)

(c)

FIGURE 2.20

First paragraph of Oscar Wilde’s The Picture of Dorian Gray: (a) Vowels removed. (b) Vowels intact at beginning of words. (c) Original.

character shown in parentheses. The entries are from a corpus of 630,000 text mes sages containing over nine million Chinese characters.

A notable feature of some corpora is part-of-speech (POS) tagging, where words are tagged by their category, such as noun, verb, and adjective. Importantly, the part of speech is contextual, re"ecting a word’s use in the original text. For example, paint is sometimes a verb (Children paint with passion), sometimes a noun (The paint is dry). POS tagging can be important in predictive systems where knowing a word’s POS limits the possibilities for the next word (Gong, Tarasewich, and MacKenzie, 2008).

2.6.1 Redundancy in language

Native speakers of a language innately possess an immense understanding of the statistics of the language. We automatically insert words that are omitted or obscured (ham and ____ sandwich). We anticipate words (a picture is worth a thou sand _____), letters (questio_), or entire phrases (to be or ___ __ __). We might wonder: since humans can !ll in missing letters or words, perhaps the unneeded portions can be omitted. Let’s consider this further. The example in Figure 2.20 gives three variations of a paragraph of text. The original excerpt contains 243 char acters. In part (a), all 71 vowels are removed, thus shortening the text by 29.2 per cent. Many words are easily guessed (e.g., smmr→summer, thrgh→through) and with some effort the gist of the text is apparent. It has something to do with summer [smmr], gardens [grdn], and scent [scnt]. Part (b) is similar except the !rst letter of each word is intact, even if it is a vowel. Still, 62 vowels are missing. The meaning

FIGURE 2.21

(b)

Shortening English: (a) SMS shorthand. (b) Standard English.

is slightly easier to decipher. The original text is given in (c). It is the !rst paragraph from Oscar Wilde’s The Picture of Dorian Gray.

There are other examples, as above, where portions of text are removed, yet comprehension remains. SMS text messaging is a well-documented example. In addition to removing characters, recoding is often used. There are numerous techniques employed, such as using sound (th@s→that’s, gr8→great) or invented acronyms (w→with, gf→girlfriend, x→times) (Grinter and Eldridge, 2003). One anecdote tells of a 13-year-old student who submitted an entire essay written in SMS shorthand. 10 Although the teacher was not impressed, the student’s rationale was direct and honest: it is easier to write in shorthand than in standard English. An example from the essay is shown in Figure 2.21. Part (a) gives the shortened text. There are 26 words and 102 characters (including spaces). The expanded text in (b) contains 39 words and 199 characters. The reduction is dramatic: 48.7 percent fewer characters in the SMS shorthand. Of course, there are differences between this example and Figure 2.20. For instance, in this example, punctuation and digits are introduced for recoding. As well, the shortened message is tailored to the lan guage of a particular community of users. It is likely the 13-year-old’s teacher was not of that community.

There is, unfortunately, a more insidious side to redundancy in written text. A common fault in writing is the presence of super"uous words, with their eradica tion promoted in many books on writing style. Strunk and White’s Rule 17 is Omit Needless Words, and advises reducing, for example, “he is a man who” to “he,” or “this is a subject that” to “this subject” (Strunk and White, 2000, p. 23). Tips on writing style are given in Chapter 8.

2.6.2 Entropy in language

If redundancy in language is what we inherently know, entropy is what we don’t know—the uncertainty about forthcoming letters, words, phrases, ideas, con cepts, and so on. Clearly, redundancy and entropy are related: If we remove what we know, what remains is what we don’t know. A demonstration of redundancy and entropy in written English was provided in the 1950s by Shannon in a letter guessing experiment (Shannon, 1951). (See Figure 2.22.) The experiment proceeds as follows. The participant is asked to guess the letters in a phrase, starting at the

10 news.bbc.co.uk/2/hi/uk_news/2814235.stm.

FIGURE 2.22

Shannon’s letter-guessing experiment.

(Adapted from Shannon, 1951)

beginning. As guessing proceeds, the phrase is revealed to the participant, letter by letter. The results are recorded as shown in the line below each phrase in the !gure. A dash (“-”) is a correct guess; a letter is an incorrect guess. Shannon called the second line the “reduced text.” In terms of redundancy and entropy, a dash repre sents redundancy (what is known), while a letter represents entropy (what is not known). Among the interesting observations in Figure 2.22 is that errors are more common at the beginning of words, less common as words progress. The statistical nature of the language and the participant’s inherent understanding of the language facilitate guessing within words.

The letter-guessing experiment in Figure 2.22 is more than a curiosity. Shannon was motivated to quantify the entropy of English in information-theoretic terms. He pointed out, for example, that both lines in each phrase-pair contain the same infor mation in that it is possible, with a good statistical model, to recover the !rst line from the second. Because of the redundancy in printed English (viz. the dashes), a communications system need only transmit the reduced text. The original text can be recovered using the statistical model. Shannon also demonstrated how to compute the entropy of printed English. Considering letter frequencies alone, the entropy is about 4.25 bits per letter.11 Considering previous letters, the entropy is reduced because there is less uncertainty about forthcoming letters. Considering long range statistical effects (up to 100 letters), Shannon estimated the entropy of printed English at about one bit per letter with a corresponding redundancy of about 75 percent.

See also student exercise 2-2 at the end of this chapter.

2.7 Human performance

Humans use their sensors, brain, and responders to do things. When the three ele ments work together to achieve a goal, human performance arises. Whether the

11 The data set and calculation are given in Chapter 7 (see Figure 7.19).

FIGURE 2.23

Variability of people in performing a task such as typing.

action is tying shoelaces, folding clothes, searching the Internet, or entering a text message on a mobile phone, human performance is present. Better perfor mance is typically associated with faster or more accurate behavior, and this leads to a fundamental property of human performance—the speed-accuracy trade-off: go faster and errors increase; slow down and accuracy improves. Reported in aca demic papers dating back more than a century (see Swensson, 1972, for a review), mundane and proverbial (“Haste makes waste”), and steeped in common sense (we instinctively slow down to avoid errors), it is hard to imagine a more banal feature of human performance. Clearly, research on a new interface or interaction technique that seeks to determine the speed in doing a task must consider accuracy as well.

Humans position themselves on the speed-accuracy trade-off in a manner that is both comfortable and consistent with their goals. Sometimes we act with haste, even recklessly; at other times we act with great attention to detail. Furthermore, we may act in the presence of a secondary task, such as listening to the radio, convers ing with a friend, or driving a car. Clearly, context plays an important role, as do the limits and capabilities of the sensors, the brain, and the responders.

With human performance, we begin to see complexities and challenges in HCI that are absent in traditional sciences such as physics and chemistry. Humans bring diversity and variability, and these characteristics bring imprecision and uncer tainty. Some humans perform tasks better than others. As well, a particular human may perform a task better in one context and environment than when performing the same task in a different context and environment. Furthermore, if that same human performs the same task repeatedly in the same context and environment, the outcome will likely vary.

Human diversity in performing tasks is sometimes illustrated in a distribution, as in Figure 2.23. Here the distribution reveals the number of people performing a task (y-axis) versus their pro!ciency in doing it (x-axis). The example assumes computer users as the population and illustrates typing on a conventional computer keyboard as the task. Most people fall somewhere in the middle of the distribution.

FIGURE 2.24

Simple reaction time: (a) The user fixates on the grey box. (b) After a delay, the box turns red whereupon the user presses a key as quickly as possible.

Typing speeds here are in the range of, say, 30–70 words per minute. Some people are slower, some faster. However, a small number of people will be exceedingly fast, say, 150 words per minute or faster. Yet others, also a small number, exhibit dif!culty in achieving even a modest speed, such as 5 words per minute, equivalent to one word every 12 seconds.

2.7.1 Reaction time

One of the most primitive manifestations of human performance is simple reaction time, de!ned as the delay between the occurrence of a single !xed stimulus and the initiation of a response assigned to it (Fitts and Posner, 1968, p. 95). An example is pressing a button in response to the onset of a stimulus light. The task involves the three elements of the human shown in Figure 2.17. The cognitive operation is trivial, so the task is relatively easy to study. While the apparatus in experimental settings is usually simple, humans react to more complex apparatus all the time, in everyday pursuits and in a variety of contexts, such as reacting to the ring of a phone, to a traf!c light, or to water in a bath (hot!). These three examples all involve a motor response. But the sensory stimuli differ. The ring of a phone is an auditory stimulus; a changing traf!c light is a visual stimulus; hot water touching the skin is a tactile stimulus. It is known that simple reaction times differ accord ing to the stimulus source, with approximate values of 150 ms (auditory), 200 ms (visual), 300 ms (smell), and 700 ms (pain) (Bailey, 1996, p. 41).

To explore reaction times further, a Java-based application was developed to experimentally test and demonstrate several reaction time tasks. 12 (See also Appendix A.) After describing each task, the results of an experiment are presented. For simple reaction, the interface is shown in Figure 2.24. A trial begins with the 12 The software, a detailed API, and related !les are in ReactionTimeExperiment.zip, avail able on this book’s website.

FIGURE 2.25

Physical matching: (a) Initial stimulus. (b) After a delay, a second stimulus appears. (c) Setup.

appearance of a grey box in a GUI window. Following a delay, the box turns red (color is not apparent in grayscale print). This is the sensory stimulus. The user’s goal is to press a key on the system keyboard as quickly as possible after the stimu lus appears. The delay between the grey box appearing and the box turning red is randomized to prevent the user from anticipating the onset of the stimulus.

The software implements three extensions of simple reaction tasks: physi cal matching, name matching, and class matching. Each adds a layer of complex ity to the cognitive operation. The tasks were modeled after descriptions by Card et al. (1983, 65–71). For physical matching, the user is presented with a !ve-letter word as an initial stimulus. After a delay a second stimulus appears, also a !ve letter word. The user responds as quickly as possible by pressing one of two keys: a “match” key if the second stimulus matches the !rst stimulus, or a “no-match” key if the second stimulus differs from the !rst stimulus. Matches occur with 50 percent probability. An example experimental setup is shown in Figure 2.25.

Obviously, physical matching is more complicated than simple reaction, since the user must compare the stimulus to a code stored in working memory. There are many examples of similar tasks in HCI, such as entering text on a mobile phone using predictive input (T9). While entering a word, the user has in her or his mind an intended word. This is the initial stimulus. With the last keystroke, the system presents a word. This is the second stimulus. If the presented word matches the intended word, the user presses 0 to accept the word. If the presented word does not match the intended word, the user presses * to retrieve the next alternative word matching the key sequence. (Details vary depending on the phone.)

Name matching is the same as physical matching except the words vary in appearance: uppercase or lowercase, mono-spaced or sans serif, plain or bold, 18 point or 20 point. A match is deemed to occur if the words are the same, regardless of the look of the fonts. See Figure 2.26. Name matching should take longer than physical matching because “the user must now wait until the visual code has been

FIGURE 2.26

Name matching: (a) Initial stimulus. (b) Second stimulus.

FIGURE 2.27

Class matching: (a) Initial stimulus. (b) Second stimulus.

recognized and an abstract code representing the name of the letter is available” (Card et al., 1983, p. 69).

For class matching, the initial stimulus contains a letter or digit. After a delay a second stimulus appears, also containing a letter or digit. The font is mono-spaced or sans serif, plain or italic, 18 point or 20 point. A match is deemed to occur if both symbols are of the same class; that is, both are letters or both are digits. Class matching takes longer still, because “the user has to make multiple references to long-term memory” (Card et al., 1983, p. 70). To avoid confusion, 0 (digit) and O (letter) are not included, nor are 1 (digit) and I (letter). (See Figure 2.27.)

The interfaces described above were tested in the lab component of a course on HCI. Fourteen students served as participants and performed three blocks of ten trials for each condition. The !rst block was considered practice and was discarded. To off set learning effects, participants were divided into two groups of equal size. One group

FIGURE 2.28

Results of an experiment comparing several reaction tasks. Error bars show ±1 SD.

preformed the simple reaction task !rst, followed in order by the physical, name, and class matching tasks. The other group performed the tasks in the reverse order.

The results are shown in Figure 2.28. The mean time for simple reaction was 276 ms. This value is nicely positioned in the 113 to 528 ms range noted earlier for reaction time tasks (see Figure 2.17). Note that the time measurement began with the arrival of the second stimulus and ended with the key event registered in the software when a key was pressed; thus, the measurement includes the time for the motor response.

Physical matching took about twice as long as simple reaction, depend ing on whether the second stimulus was a match (482 ms) or a no-match (538 ms). Interestingly enough, name matching did not take longer than physical matching. One explanation is that the words in the name-matching task had insuf!cient variability in appearance to require additional cognitive processing. Class matching was the hardest of the tasks, with means of about 565 ms for both the match and no-match conditions.

Choice reaction is yet another type of reaction time task. In this case, the user has n stimuli, such as lights, and n responders, such as switches. There is a one for one correspondence between stimulus and response. Choice reaction time is dis cussed in Chapter 7 on modeling.

A variation on reaction time is visual search. Here, the user scans a collection of items, searching for a desired item. Obviously, the time increases with the number of items to scan. The software described above includes a mode for visual search, with the search space con!gurable for 1, 2, 4, 8, 16, or 32 items. An example for N =16 is shown in Figure 2.29. The initial stimulus is a single letter. After a random

FIGURE 2.29

Visual search: (a) Initial stimulus. (b) After a delay a collection of letters appears.

delay of two to !ve seconds, the squares on the right are populated with letters selected at random. The initial stimulus appears on the right with 50 percent prob ability. The user presses a “match” or “no-match” key, as appropriate.

A small experiment was conducted with the same 14 students from the exper iment described above, using a similar procedure. The results are shown in Figure 2.30 in two forms. In (a), reaction time (RT) versus number of items (N) is plotted. Each marker reveals the mean of 14 × (10 + 10) = 280 trials. The markers are connected and a linear regression line is superimposed. At R2 = .9929, the regression model is an excellent !t. Clearly, there is a linear relationship between reaction time in a visual search task and the number of items to scan. This is well known in the HCI literature, particularly from research on menu selection (e.g., Cockburn, Gutwin, and Greenberg, 2007; Hornof and Kieras, 1997; Landauer and Nachbar, 1985). For this experiment,

RT 498 41 N ms (1) N =1 is a special case since there is only one item to scan. This reduces the task to physical matching. The task is slightly different than in the physical matching experiment, since the user is matching a letter rather than a word. Nevertheless, the result is consistent with the physical matching result in Figure 2.28 (RT ≈ 500 ms).

In Figure 2.30b, the results are given separately for the match trials and the no match trials. The no-match trials take longer. The reason is simple. If the initial stimulus is not present, an exhaustive search is required to determine such before pressing the no-match key. If the initial stimulus is present, the user presses the match key immediately when the initial stimulus is located in the right-side stimuli. The effect only surfaces at N =16 and N =32, however.

Before moving on, here is an interesting reaction time situation, and it bears directly on the title of this section, Human Performance. Consider an athlete com peting in the 100 meter dash in the Olympics. Sometimes at the beginning of a race there is a “false start.” The de!nition of a false start is rather interesting: a false start occurs if an athlete reacts to the starter’s pistol before it is sounded or within

FIGURE 2.30

Results of visual search experiment: (a) Overall result with linear regression model. (b) Results by match and no-match trials.

100 ms after.13 Clearly, an athlete who reacts before the starter’s pistol sounds is anticipating, not reacting. Interesting in the de!nition, however, is the criterion that a false start has occurred if the athlete reacts within 100 ms after the starter’s pistol is sounded. One hundred milliseconds is precariously close to the lower bound on reaction time, which is cited in Figure 2.17 as 113 ms. Card et al. peg the lower bound at 105 ms (Card et al., 1983, p. 66). World records are set, and gold medals won, by humans at the extreme tails of the normal distribution. Is it possible that 161.2 of the International Association of Athletics Federations (IAAF) deems a false start to occur “when the reaction time is less than 100/1000ths of a second.” See www.iaaf.org/mm/ Document/imported/42192.pdf (107).

13 Rule

a false start is declared occasionally, very occasionally, when none occurred (e.g., honestly reacting 95 ms after the starter’s pistol is !red)? There are slight differ ences between the lower-bound reaction times cited above and the false-start sce nario, however. The values cited are for pressing a key with a !nger in response to a visual stimulus. The motor response signals in the 100 meter dash must travel far ther to reach the feet. This tends to lengthen the reaction time. Also, the stimulus in the 100 meter dash is auditory, not visual. Auditory reaction time is less than visual reaction time, so this tends to shorten the reaction time. Nevertheless, the exam ple illustrates the application of low-level research in experimental psychology to human performance and to the design of human-machine systems.

2.7.3 Skilled behavior

The response time tasks in the previous section are simple: a sensory stimulus initi ates a simple cognitive operation, which is followed by a simple motor response. It takes just a few trials to get comfortable with the task and with additional practice there is little if any improvement in performance. However, in many tasks, human performance improves considerably and continuously with practice. For such tasks, the phenomenon of learning and improving is so pronounced that the most endear ing property of the task is the progression in performance and the level of perfor mance achieved, according to a criterion such as speed, accuracy, degree of success, and so on. Skilled behavior, then, is a property of human behavior whereby human performance necessarily improves through practice. Examples include playing darts, playing chess and, in computing scenarios, gaming or programming. One’s ability to do these tasks is likely to bear signi!cantly on the amount of practice done.

The examples just cited were chosen for a reason. They delineate two catego ries of skilled behavior: sensory-motor skill and mental skill (Welford, 1968, p. 21). Pro!ciency in darts or gaming is likely to emphasize sensory-motor skill, while pro!ciency in chess or computer programming is likely to emphasize mental skill. Of course, there is no dichotomy. All skilled behavior requires mental faculties, such as perception, decision, and judgment. Similarly, even the most contemplative of skilled tasks requires coordinated, overt action by the hands or other organs.

While tasks such as gaming and computer programming may focus on sensory motor skill or mental skill, respectively, other tasks involve considerable elements of both. Consider a physician performing minimally invasive surgery, as is common for abdominal procedures. To access the abdominal area, a camera and a light mounted at the end of a laparoscope are inserted through a small incision, with the image displayed on an overhead monitor. Tools are inserted through other incisions for convenient access to an internal organ. The surgeon views the monitor and manipu lates the tools to grasp and cut tissue. In Figure 2.31a, the tips of the surgeon’s tools for grasping (left) and cutting (top) are shown as they appear on a monitor during a cholecystectomy, or gallbladder removal. The tools are manually operated, exter nal to the patient. Figure 2.31b shows examples of such tools in a training simula tor. The tools are complex instruments. Note, for example, that the tips of the tools

FIGURE 2.31

Sensory-motor skill combined with mental skill during laparoscopic surgery: (a) Tips of tools for grasping and cutting. (b) Exterior view of tools and monitor in a training simulator.

(Photos courtesy of the Centre of Excellence for Simulation Education and Innovation at Vancouver General Hospital)

articulate, or bend, thus providing an additional degree of freedom for the surgeon (Martinec, Gatta, Zheng, Denk, and Swanstrom, 2009). Clearly, the human-machine interaction involves both sensory-motor skill (operating the tools while viewing a monitor) and mental skill (knowing what to do and the strategy for doing it).

One way to study skilled behavior is to record and chart the progression of skill over a period of time. The level of skill is measured in a dependent variable, such as speed, accuracy, or some variation of these. The time element is typically a conven ient procedural unit such as trial iteration, block or session number, or a temporal unit such as minutes, hours, days, months, or years. Measuring and modeling the progression of skill is common in HCI research, particularly where users confront a new interface or interaction technique. The methodology for evaluating skilled behavior is presented in Chapter 5 (see Longitudinal Studies) with the mathemati cal steps for modeling presented in Chapter 7 (see Skill Acquisition). See also stu dent exercise 2-4 at the end of this chapter.

2.7.4 Attention

Texting while driving. It’s hard to imagine a more provocative theme to open this discussion on attention. Although driving a car is relatively easy, even the most experienced driver is a potential killer if he or she chooses to read and send text messages while driving. The problem lies in one’s inability to attend to both tasks simultaneously. Much like the bottleneck posed by working memory (7 ± 2 items), the human ability to attend is also limited. But what is the limit? More funda mentally, what is attention? Which tasks require attention? Which do not? How is human performance impacted? According to one view, attention is a property of human behavior that occurs when a person who is attending to one thing can not attend to another (Keele, 1973, p. 4). Typing, for example, requires attention because while typing we cannot engage in conversation. On the other hand, walking requires very little attention since we can think, converse, and do other things while walking. One way to study attention is to observe and measure humans performing two tasks separately and then to repeat the procedure with the two tasks performed simultaneously. A task with performance that degrades in the simultaneous case is said to require attention.

Attention is often studied along two themes: divided attention and selected atten tion (B. H. Kantowitz and Sorkin, 1983, p. 179). Divided attention is the process of concentrating on and doing more than one task at time. Texting while driving is an example, and the effect is obvious enough. In other cases, divided attention poses no problem, as in walking and talking. Selected attention (aka focused attention) is attend ing to one task to the exclusion of others. For example, we converse with a friend in a crowded noise-!lled room while blocking out extraneous chatter. But there are lim its. In that same conversation we are occasionally unable to recall words just spoken because our attention drifted away or was pulled away by a distraction. Selective atten tion, then, is the human ability to ignore extraneous events and to maintain focus on a primary task. One theory of selective attention holds that our ability to selectively attend bears on the importance of the events to the individual. A person listening to a speech is likely to stop listening if the person’s name is spoken from another loca tion (Keele, 1973, p. 140). One’s own name is intrinsically important and is likely to intrude on the ability to selectively attend to the speech. Clearly, importance is subjec tive. Wickens gives an example of an airplane crash where the "ight crew were preoc cupied with a malfunction in the cockpit that had no bearing on the safety of the "ight (Wickens, 1987, p. 249). The crew attended to the malfunction while failing to notice critical altimeter readings showing that the airplane was gradually descending to the ground. The malfunction was of salient importance to the "ight crew.

The distinction between divided and selected attention is often explained in terms of channels (Wickens, 1987, p. 254). Events in a single channel (e.g., visual, auditory, motor) are processed in parallel, whereas events in different channels are processed in serial. When processing events in parallel (single channel) one event may intrude on the ability to focus attention on another event. When processing events in serial (different channels), we strive to focus on one event to the exclusion of others or to divide attention in a convenient manner between the channels.

Analyzing accidents is an important theme in human factors, as the aviation example above illustrates, and there is no shortage of incidents. Accidents on the road, in the air, on the seas, or in industry are numerous and in many cases the cause is at least partly attributable to the human element—to distractions or to selectively attending to inappropriate events. One such accident involving a driver and a cyclist occurred because a Tamagotchi digital pet distracted the driver.14 Evidently, the pet developed a dire need for “food” and was distressed: bleep, bleep, bleep, bleep, bleep. The call of the pet was of salient importance to the driver, with a horri!c and fatal outcome (Casey, 2006, pp. 255–259). More likely today, it is the call of the mobile phone that brings danger. The statistics are shocking, yet unsur prising—a 23-fold increase in the risk of collision while texting (Richtel, 2009). Attention has relevance in HCI in for example, of!ce environments where inter ruptions that demand task switching affect productivity (Czerwinski, Horvitz, and Wilhite, 2004). The mobile age has brought a milieu of issues bearing on atten tion. Not only are attention resources limited, these resources are engaged while users are on the move. There is a shift toward immediate, brief tasks that demand constant vigilance and user availability, with increasingly demanding expectations in response times. So-called psychosocial tasks compete for and deplete attention resources, with evidence pointing to an eventual breakdown of "uency in the inter action (Oulasvirta, Tamminen, Roto, and Kuorelahti, 2005).

2.7.5 Human error

Human error can be examined from many perspectives. In HCI experiments test ing new interfaces or interaction techniques, errors are an important metric for performance. An error is a discrete event in a task, or trial, where the outcome is incorrect, having deviated from the correct and desired outcome. The events are logged and analyzed as a component of human performance, along with task com pletion time and other measurable properties of the interaction. Typically, errors are reported as the ratio of incorrectly completed trials to all trials, and are often reported as a percent (× 100). Sometimes accuracy is reported—the ratio of cor rectly completed trials to all trials.

Two examples for computing tasks are shown in Figure 2.32. A GUI target selection task is shown on the left in two forms. The top image shows the goal: moving a tracking symbol from a starting position to a target and ending with a select operation. The bottom image shows an error, since the !nal selection was outside the target. A text entry task is shown on the right. The goal of entering the word quickly is shown correctly done at the top. The bottom image shows an error, since the word was entered incorrectly.

Mishaps and miscues in human performance are many. Often, a simple catego rization of the outcome of a task as correct or incorrect falls short of fully captur ing the behavior. We need look no further than Figure 2.32 for examples. Not only were the results of the tasks on the bottom erroneous in a discrete sense, there were additional behaviors that deviated from perfect execution of the tasks. For the target selection error, the tracking symbol veered off the direct path to the target. For the text entry error, it appears that at least part of the word was correctly entered.

Taking a broader perspective, human error is often studied by examining how and why errors occur. Once again, Figure 2.32 provides insight. In the erroneous 14 Ample descriptions of the Tamagotchi are found in various online sources (search using “Tamagotchi”).

Target Selection Text Entry
Correct quickly
Incorrect qucehkly

FIGURE 2.32

Common computing tasks completed correctly (top) and incorrectly (bottom).

target selection task, was there a control problem with the input device? Was the device’s gain setting too sensitive? Was the device a mouse, a touchpad, an eye tracker, a game controller, or some other input control? Note as well that the track ing symbol entered then exited the target. Was there a problem with the !nal target acquisition in the task? In the erroneous text entry task, if input involved a key board, were errors due to the user pressing keys adjacent to correct keys? Were the keys too small? If entry involved gestural input using a !nger or stylus on a digitiz ing surface, did the user enter the wrong gesture or an ill-formed gesture? Was the digitizing surface too small, awkwardly positioned, or unstable? Clearly, there are many questions that arise in developing a full understanding of how and why errors occur. Note as well that the questions above are not simply about the human; they also question aspects of the device and the interaction.

An even broader perspective in analyzing errors may question the environmen tal circumstances coincident with the tasks. Were users disadvantaged due to noise, vibration, lighting, or other environmental conditions? Were users walking or per forming a secondary task? Were they distracted by the presence of other people, as might occur in a social setting?

Human factors researchers often examine human error as a factor in industrial accidents where the outcome causes substantial damage or loss of life. Such events rarely occur simply because a human operator presses the wrong button, or commits an interaction error with the system or interface. Usually, the failures are systemic— the result of a con"uence of events, many having little to do with the human.

To the extent that a signi!cant accident is determined to have resulted from human error, a deeper analysis is often more revealing. Casey’s retelling of dozens of such accidents leads to the conclusion that the failures are often design-induced errors (Casey, 1998, p. 2006). This point is re-cast as follows: if a human operator mistakenly "icks the wrong switch or enters an incorrect value, and the action results in a serious accident, is the failure due to human error? Partly so, perhaps, but clearly the accident is enabled by the design of whatever he or she is operating. A design that can lead to catastrophic outcomes purely on the basis of an operator’s interaction error is a faulty design. For safety-critical systems, interaction errors by an operator must be considered and accounted for. Such errors are not only possible, they are, in time, likely. Designs of safety-critical systems must accommodate such vagaries in human behavior.

STUDENT EXERCISES

2-1. Pen!eld’s motor homunculus in Figure 2.9 illustrates the area in the cerebral cortex devoted to human responders. The sketch includes solid bars corre sponding to the cortical area for each responder. The length of each bar is a quantitative indicator. Reverse engineer the motor homunculus to determine the length of each bar. The general idea is shown below for the toes and ankles.

A B C D E F G H
1 Responder x1 y1 x2 y2 dx dy Length
2 Toes 52 153 55 111 -3 42 42.1
3 Ankle 56 106 64 58 -8 48 48.7

The shaded cells contain values digitized from an image processing applica tion. The toes bar, for example, extends from (52, 153) to (55, 111). Using the Pythagorean theorem, the length is 42.1 pixels. Of course, the scale and units are arbitrary. Evidently, there is about 15.7 percent more cortical area devoted to the ankle than to the toes. See above. This is also evident in the !gure. For all responders, digitize the endpoints of the corresponding bars and enter the values in a spreadsheet, as above. Create a bar chart showing the rel ative amounts of cortical area for each responder. It might be useful to collect together the values for the leg, arm, and head, with each shown as the sum of contributing responders. Write a brief report discussing the motor homuncu lus and the empirical data for the various responders.

2-2. Conduct a small experiment on memory recall as follows. Find an old dis carded keyboard and remove the key tops for the letters (below left). Using a drawing application, create and print an outline of the letter portion of a Qwerty keyboard (below right). Find !ve computer users (participants). Ask each one to position the key tops in the printout. Limit the time for the task to three minutes. Record the number of key tops correctly positioned. (Suggestion: Photograph the result and do the analysis afterward.)

Then assess each participant’s typing style and typing speed as follows. Open a blank document in an editor and enter the phrase “the quick brown fox jumps over the lazy dog.” On the next line, ask the participant to cor rectly type the same phrase. Measure and record the time in seconds. Repeat !ve times. For each participant, note and record whether the typing style is touch or hunt-and-peck. Enter the data into a spreadsheet. Convert the time to enter the phrase (t, in seconds) to typing speed (s, in words per minute) using s = (43/5)/(t/60). Write a brief report on your !ndings for the number of key tops correctly positioned. Consider participants overall as well as by typing speed and by typing style. Discuss other relevant observations.

2-3. Conduct a small experiment on redundancy and entropy in written

English, similar to Shannon’s letter guessing experiment described ear lier (see Figure 2.22). Use 5–10 participants. For the experiment, use the LetterGuessingExperiment software provided on this book’s web site. Use !ve trials (phrases) for each participant. The setup dialog and a screen snap of the experiment procedure are shown below:

Data collection is automated in the software. Analyze the results for the number of letters correctly guessed (redundancy) and the number incor rectly guessed (entropy). Examine the results overall and by participants. Investigate, as well, whether the responses differ according to the position of letters in words and in phrases. Write a brief report on your !ndings.

2-4. Construct a 2D chart on skilled behavior showing sensory-motor skill on one axis and mental skill on the other. For both axes, add the label little near the origin, and lots near the end. An example of a similar chart is given in Figure 3.46. Add markers in the chart showing at least !ve computing skills. Position the markers according to the relative emphasis on sensory-motor skill and mental skill in each task. Write a brief report, describing each skill and rationalizing the position of the marker in the chart. For guidance, see the

discussion in this chapter on skilled behavior. For further guidance, read the discussion in Chapter 7 on descriptive models.

2-5. Conduct a small experiment on gestural input, human performance, and human error using the GraffitiExperiment software on this book’s website. There is both a Windows version and an Android version. Recruit about 10 participants. Divide the participants into two groups and use a different input method for each group. Consider using a mouse and touch pad (Windows) or a !nger and stylus (Android). The software uses Graf!ti gestures for text entry. For the experiment, the participants are to enter the alphabet 10 times. The setup dialog and a screen snap of the experimental procedure are shown below for Windows (top) and for Android (bottom) .

One of the options in the setup dialog is “Phrases !le.” Use alphabet.txt. Set the “Number of phrases” to 10. Leave “Show gesture set” checked. The gestures are viewable in the experiment screen (see above). Participants may correct errors using the Backspace stroke (←). However, instruct participants not to attempt more than three corrections per symbol. Data collection is automated. Consult the API for complete details. Analyze the data to reveal the progress over the 10 trials for both groups of participants. Analyze the entry speed (wpm), error rate (%), and keystrokes per char (KSPC). (A “keystroke,” here, is a gesture stroke.) Write a brief report on your !ndings.

2-6. Conduct a small experiment on human reaction time using the ReactionTimeExperiment software provided on this book’s website. Recruit about 10 participants. The setup dialog is shown below. Examples of the experimental procedure are given above (see Reaction Time).

Consider modifying the software in some way, such using words instead of letters for the visual search task, or using an auditory stimulus instead of a visual stimulus for the simple reaction time task. The modi!cation can serve as a point of comparison (e.g., visual search for words versus letters, or reaction time to an auditory stimulus versus a visual stimulus). Write a brief report on your !ndings.

Brainstorm, Chainstorm, Cheatstorm, Tweetstorm: New Ideation Strategies for Distributed HCI Design

Haakon Faste

HCI Institute Carnegie Mellon

Nir Rachmel

HCI Institute Carnegie Mellon

hfaste@cs.cmu.edu nir.rachmel@gmail.com

ABSTRACT

In this paper we describe the results of a design-driven study of collaborative ideation. Based on preliminary findings that identified a novel digital ideation paradigm we refer to as chainstorming, or online communication brainstorming, two exploratory studies were performed. First, we developed and tested a distributed method of ideation we call cheatstorming, in which previously generated brainstorm ideas are delivered to targeted local contexts in response to a prompt. We then performed a more rigorous case study to examine the cheatstorming method and consider its possible implementation in the context of a distributed online ideation tool. Based on observations from these studies, we conclude with the somewhat provocative suggestion that ideation need not require the generation of new ideas. Rather, we present a model of ideation suggesting that its value has less to do with the generation of novel ideas than the cultural influence exerted by unconventional ideas on the ideating team. Thus brainstorming is more than the pooling of “invented” ideas, it involves the sharing and interpretation of concepts in unintended and (ideally) unanticipated ways.

Author Keywords

Ideation; brainstorming; chainstorming; cheatstorming; tweetstormer;

ACM Classification Keywords

H.5.2. User Interfaces: Theory and Methods; H.5.3. Group and Organization Interfaces: Collaborative computing

General Terms

Design; Experimentation.

INTRODUCTION

The ability to generate new ideas as part of a creative design process is essential to research and practice in human-computer interaction. The question of how best to generate ideas is not entirely clear, however. Not only are countless design and research methodologies commonly employed by HCI teams, their ideation effectiveness depends on numerous interdependent and variable factors including the scope and objectives of the project in

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

CHI 2013, April 27–May 2, 2013, Paris, France. Copyright © 2013 ACM 978-1-4503-1899-0/13/04...$15.00.

Russell Essary

HCI Institute Carnegie Mellon

Evan Sheehan

HCI Institute Carnegie Mellon

russell.essary@gmail.com wesheehan@gmail.com

question, the expertise and variety of the people involved, the strength and familiarity of their social relationships— not to mention their degree of familiarity with previous ideation and research activities—and cultural and personal factors including a person’s workplace norms and values, personal motivations and desires, confidence, degree of social collaboration, esteem, and so on. In this paper we describe the results of a design-driven study conducted with the aim of improving collaborative ideation on HCI projects using distributed software tools. Specifically we focused our research on how digital tools might be used to enhance the practice of group ideation among members of asynchronously distributed collaborative teams.

A range of different ideation techniques are used in design and HCI. In this paper, we begin with a discussion of the relative benefits and drawbacks of one such ideation method, specifically brainstorming, as described by Osborn [33] and evaluated by Isaksen [21], among others. We then describe a design research process that explored the creation of distributed brainstorming alternatives. Two exploratory studies were performed. First, we developed and tested an ideation method we refer to as cheatstorming. Using this technique, previously generated brainstorm ideas are delivered to targeted local contexts without the need for imaginative ideation. We then performed a second study of the cheatstorming method to better understand its implications and improve its efficiency. Based on observations from these studies, we conclude with the observation that ideation need not be limited to the generation of new ideas. From this perspective, the value of group ideation activities such as brainstorming has less to do with the creation of novel ideas than its cultural influence on the ideating team. Ideation, in short, is the radical re distribution of ideas to “unconventionalize” a given context.

Brainstorming Effectiveness as an Ideation Technique

The term brainstorming is best identified today with Osborn’s book on creativity titled Applied Imagination, first published in 1953 [33]. Osborn, who worked as an advertising executive in the 1940s and 50s, wrote a detailed examination of the creative problem solving process, and introduced brainstorming as one part of this process. Rich with current examples from that time, the book attempted to systematically define a method for deliberate creative group ideation from a very practical standpoint. Osborn divided the process into three main phases [33, p. 86]:

(1) Fact-finding: Problem-definition and preparation; gathering and analyzing the relevant data.

(2) Idea-finding: Idea-production and idea-development; thinking of tentative ideas and possible leads and then selecting and combining them.

(3) Solution finding: Evaluation and adoption; verifying the offered solutions, and deciding on and implementing a final selected set.

In great detail, Osborn explains suggested practices for performing each of these stages, focusing in particular on the Idea-finding phase. He claimed that Idea-finding is “the part of problem-solving that is most likely to be neglected” by groups [33, p. 111], and offered four guidelines that should be carefully followed in order to conduct a brainstorming session effectively and yield the best results:

1. Criticism is ruled out: Adverse judgment of ideas must

be withheld until later. 2. “Free-wheeling” is welcomed: the wilder the idea the

better; it is easier to tame down than to think up. 3. Quantity is wanted: The greater the number of ideas,

the more the likelihood of useful ideas. 4. Combination and improvement are sought: In addition to contributing ideas of their own, participants should suggest how ideas of others can be turned into better ideas; or how two or more ideas can be joined into still another idea. [33]

Since then, many have built on these rules as brainstorming has become an increasingly popular method for idea generation in business and academic contexts. For example, Osborn’s rules have been adapted to be more playful and memorable for educational purposes (e.g. “Gleefully suspend judgment,”

“Leapfrog off the ideas of others” [14]), and additional rules such as “be visual,” “stay focused on the topic,” and “one conversation at a time” have been added to better guide brainstorming sessions in the context of corporate design consulting [23]. Despite its widespread adoption in collaborative innovation environments in industry, the effectiveness of brainstorming has been a hot topic of debate in the academic community since its first introduction. The first criticism was sparked by a 1958 paper published by a group from Yale University (Taylor, Berry and Block) that compared the performance of randomly assigned brainstorming groups with that of randomly assigned individuals whose work was later pooled [42]. Numerous subsequent studies (e.g. [2, 35, 8, 26]) have built on this work to critique the effectiveness of brainstorming in groups relative to individuals working independently, arguing—among other things—that fewer good ideas are generated for each hour of individual effort expended. It is important to note, however, that the Taylor, Berry and Block study [42] did not actually test the effectiveness of the rules of brainstorming, since the same rules were applied to both experimental conditions (individual and group). To be fair, Osborn recognized the necessity and advantages of working in groups for many reasons beyond the sheer quantity of ideas produced, especially when solving problems [33, p. 139]. In fact, the guidelines he suggested were specifically targeted at addressing the common inhibitory factors of group ideation. Rules such as “defer judgment” and “go wild” aimed not at individual productivity but improved social dynamics and sharing of ideas between members of a team. He also made the point to address a common misconception, stating that “group brainstorming is recommended solely as a supplement to individual ideation.” [33, pp. 141-142]. Still, studies critiquing the effectiveness of brainstorming on the grounds that it is inefficient were widespread through the late 1990s. More recently, the debate on productivity and collaboration has transferred to the domain of computer-mediated ideation (discussed below).

Limitations of Brainstorming

Three major explanations have been offered to account for lower purported productivity in brainstorming groups relative to ideating alone: production blocking, evaluation appre hension and free riding [8]. We discuss each briefly in turn.

Production blocking

Since only one person speaks at a time in a group setting, others are inhibited from expressing their ideas while another team-member is speaking, potentially slowing their ability to generate new ideas of their own. It is not the lack of speaking time in total that causes the alleged inhibition, as many times the flow of ideas ends before the end of a brainstorm session. Rather, it has been claimed that some participants’ ideas are suppressed or forgotten later in the process, as they may seem less relevant or less original than others being expressed. Furthermore, being in a situation where participants must passively listen to others’ ideas may distract and interrupt their thought processes and ability to record their own ideas. Examples of studies looking into this hypothesis can be found in [3, 24, 8, 16].

Evaluation apprehension

Creativity by definition is an unconventional act, and being creative therefore involves taking personal risks [13]. Even though one of the most important rules for successful brainstorming is to “defer judgment,” the fear of being criticized for having original ideas is often pervasive. Numerous authors have studied this phenomenon of “evaluation apprehension.” Maginn and Harris [29], for example, performed an experiment in which a brainstorming group was told that there were expert evaluators watching them through a one-way mirror. No major difference was observed between brainstorming performance in this condition relative to a control condition in which participants were not informed that they were being observed. In another study [5], groups of brainstorm participants were informed that some members of the group were “undercover” experts on the topic at hand. In this case, productivity loss was observed in groups that had been informed of their presence relative to a control group that had not been told.

Free riding

It may be the case that a brainstorming participant’s motivation to work decreases if they do not perceive that they will be recognized for their participation. Since brainstorming is a group activity in which all the generated ideas are ultimately grouped together, it is often the case that the generated results are not attributed to their specific contributor. Indeed, lower identifiability of ideas may increase participants’ motivation to contribute less, compared to an individual task where they know that their contribution will be recognized. Furthermore, many studies have shown that there is a lower perceived effectiveness of the individual in a group setting [8].

Structuring Ideation: Three Approaches Defined

In most of the aforementioned studies, proponents of brainstorming as an ideation technique tend to be its practitioners in the business and design communities (such as Osborn himself), while its detractors tend to be researchers interested in studying creative techniques but divorced from the nuances of its deeply embedded and culturally contextual practice [21]. Yet because the act of brainstorming incorporates numerous independent and complicated social variables—not least the makeup and experience of the team, the project objectives, the rules employed, and highly contextual success criteria—its effectiveness is difficult to study and empirically discern. Indeed, given that different ideation workplaces are likely to have differing communication patterns and communication needs depending on their cultural makeup and personnel, we find measuring the output of group ideation as a replacement for individual work to be an unsatisfactory approach. More compelling is the question of how intrinsic social and collaborative factors influence group ideation results by introducing “strangeness.” Perhaps this reflects our team’s ideological bent as design practitioners, but in today’s world, problem solving often requires experts from different fields, and new ideas are frequently sparked from novel combinations of existing concepts or the introduction of an existing concept to an unfamiliar context of use [27, 41]. Many authors have addressed the role of social factors in ideation. In this work, we ask how social factors and their resulting effects can be leveraged to develop more effective methods of group ideation online. Research has shown that social factors provide fresh sources of unexpected ideas that can help to reframe the design challenge, with design tools such as extreme characters and interaction labeling proposed as ways of dialing in the necessary “strangeness” for ideation to occur [9, 17]. Other classic ideation techniques include the use of ‘random input’ [6] and ‘oblique strategies’ [11] to generate fresh associations; by drawing on unexpected prompts and unrelated ideas to un-stick conventional thinking, such ‘trigger concepts’ bring fresh associations to the context of ideation, stimulating other associations “Like pebbles dropping in a pond.” [43] Drawing on these sources, we ask how brainstorming could be improved as a collaborative ideation technique through alternative methods of random input. In general, we classify three common social configurations of idea generation behavior: (1) face-to-face brainstorming in groups; (2) individual (or “nominal”) idea generation sessions; and (3) computer-mediated ideation. We discuss the unique traits of each of these approaches in turn:

Face-to-face Brainstorming Groups

The classic brainstorming session is done in face-to-face groups during a fixed period of time, usually between 15 to 45 minutes [33, p. 178], and is facilitated by a trained brainstorming expert that enforces the rules of brainstorming on the group. Participation is simultaneous and spontaneous: all participants can see each other’s ideas and are encouraged to build upon them. The ideas are recorded as they are suggested. At the end of a brainstorming session, Kelley et al. [23] suggest that participants vote on their favorite ideas as a way of generating closure and group consensus about which ideas are most compelling for future work. As for the optimal group’s size, in his original writings on brainstorming, Osborn suggested group sizes of up to 12 as effective [33, p. 159]. But there is no agreement in more recent literature as to optimal group-size (e.g. [16, 36, 4]), partly because it is difficult to define “optimal” in the context of real-world practice.

Nominal Idea Generation Sessions

Nominal idea generation is done individually. The main element that defines this method is that participants are not influenced by the variety of social factors at play in a traditional brainstorming group: they cannot build on other participants’ ideas because they are not exposed to them, they will be less influenced by perceived criticisms to their ideas in real-time (although they may be reluctant to share them afterwards), they may be highly motivated to perform their work in the anticipation that their efforts will eventually be rewarded, and so on. Extensive research has been done to study the benefits and shortcomings of classic vs. nominal brainstorming, as described above. In general, it appears that nominal brainstorming has some benefits in terms of both quality and quantity of ideas [20, 30, 10, 28] due to psychological effects defined by Diehl & Stroebe [8].

Computer Mediated Ideation

Advances in digital technology have led to the potential for a variety of computer-mediated ideation techniques. Within this category, the term “electronic brainstorming” refers to any kind of brainstorming mediated by computers (e.g. [40, 7, 1]). One issue attempting to define electronic brainstorming is that any online activity that involves people entering information into cloud-based systems can be considered the contribution of “ideas” to a digital pool. For our purposes we therefore consider an electronic brainstorm to be only that subset of software-mediated interactions in which users are asked to specifically generate creative responses to a question or prompt. This differs slightly (with regard to intent) from forums in which people are asked to contribute “best practices” or “suggestions” based on prior knowledge simply as an act of knowledge-transfer (e.g., suggestion portals wherein users can recommend local restaurants or hotels). It also differs from critique feeds and forums, such as post-blog comment streams debating the relative merits of an advanced position and/or themed around a topic of debate—although such kinds of activities are certainly related to electronic brainstorming and can be useful tools for the evaluation of brainstorming results as well as later phases in the ideation process.

The various possible ideation approaches described above (group brainstorming, nominal idea generation, and computer-mediated ideation) are not mutually exclusive, and can be combined and mixed to make the most of each method. A brainstorming session could be performed in two parts, for example, the first in the nominal style followed by a face-to-face method to evaluate and combine ideas across participants. Electronic brainstorming can also support both nominal and group methods, or implement a diverse array of combinations between them. Indeed, it is precisely because of the flexibility of electronic methods to distribute various aspects of the brainstorming task across asynchronous distributed teams that we performed the studies described in the following section. Group ideation is an integral part of HCI research practice, and an area where the implementation of improved software interactions could greatly enhance how ideation happens in research laboratories, design firms and product companies alike.

METHODOLOGY AND DESIGN RESEARCH

Our investigation began with the simple premise that collaborative ideation could be enhanced through the use of distributed online tools, and design-driven approaches could be used to explore and investigate the potentialities of this space. Our design team consisted of four members with diverse backgrounds including design consulting, software engineering, anthropology, and management. We held regular meetings over the course of several months to conduct freeform exploratory design research. Sessions were held once or twice weekly for 1-3 hours per session. The setting was a design studio in the HCI Institute at Carnegie Mellon University. This section describes our design research process, consisting of the following phases: (1) opportunity finding; (2) electronic brainstorming; (3) concept selection and refinement; and (4) experimentation and discussion.

Opportunity finding

We began with a vision for an online space to browse and share ideas where they could be tagged, filtered, and contextualized in the cloud. This vision was founded on two beliefs: that creators are everywhere, and that they are driven by creative ideas for which they seek open outlets. Although a clear plan for how to develop such a system was not yet evident, we first created a series of exploratory concept sketches to help envision possible outcomes and establish goals. We then analyzed aspects of our concept drawings and generated a set of Post-It notes chronicling our complete list of observations and desires. Next, we arranged these notes on a 2x2 matrix to help group them into clusters and synthesize common themes. Because our aim with this stage was to work on a meaningful project that was enjoyable and inspirational to the team, the axes of this matrix we created, ranging from low to high in each dimension, were “Fun Impact” vs. “Social Impact.” Seven areas of opportunity emerged from this exercise: (1) Reveal hidden (personal) meanings through metaphorical leaps of imagination; (2) Facilitate the discovery of thinking patterns; (3) Track creative influence to motivate participation; (4) Associate and juxtapose unexpected ideas; (5) Help people find ideas that are important to them; (6) Invent and embody “creative movements”; and (7) Spark and inspire interest and freedom.

Electronic Brainstorming

Given our interest in exploring the possibilities of electronic brainstorming we decided to experiment with distributed ideation online. Using the identified opportunity areas as jumping-off points for generative design, we restated each of the seven opportunity statements described above as a “How could we...” question (e.g. “How could we facilitate the discovery of thinking patterns?). Each question was placed at the top of a separate new Google Docs file. We then invited some 30+ interdisciplinary undergraduate and graduate students in the HCI Institute to these seven files. All of these students had prior experience with group brainstorming, and were given the instruction to each contribute at least five ideas in response to one or more of the brainstorm questions. We performed this activity over the course of a four-day weekend, with the stated goal of achieving at least 50 ideas in response to each question. On the fourth day, five of the seven questions had more than 50 ideas. For the remaining two questions the research team made a concerted effort to generate the remaining necessary ideas. In total, 350 distinct opportunity concepts were generated. Next, seven of the most involved members of the laboratory team were asked to “vote” on their favorite ideas in each file by adding a brightly colored symbol next to the item number. In this way, a selected group of 35 “favorite” ideas were agreed upon from across all seven questions.

Concept Selection and Refinement

Favorite ideas were printed out on paper, cut into strips, and placed on an Impact/Achievability matrix [15]. We then gave each of these ideas a more concise name by applying colorful Post-it notes on top of them and drawing broad categories around them with a colorful marker. The main outcome of this phase was two key concepts, each in the “easy” and “high-impact” quadrant. The first was a group of ideas we labeled “idea factories.” Of these there was one particularly compelling idea—the concept of an idea “broken telephone” game. We refer to this concept in general as “chainstorming.” The second was a category of ideas we identified as “creative judgment tasks” involving quickly voting on pre-existing ideas, much as we had done at the end of our electronic brainstorming sessions. We refer to this concept in general as “cheatstorming,” as described in studies 1 and 2 below. Finally, while not discussed here in detail, we are currently building a working prototype system that combines chainstorming with cheatstorming, called Tweetstormer, also described below. To clarify, the relationship between brainstorming, chainstorming, cheat storming, and tweetstormer is shown in figure 1.

Figure 1. A taxonomy of interrelated ideation techniques.

Experimentation: Cheatstorming (Study 1)

Our main work in this paper explores the cheatstorming concept. The basic premise of this paradigm is as follows: imagine a brainstorm has been performed, resulting in 50 ideas. Participants vote on their favorite ideas, and some of them are selected for implementation. Now another brainstorm is performed on a different topic, resulting in 50 more ideas and additional voting. In time, many hundreds of brainstorm questions are asked, and thousands of ideas are generated and saved. Some have been implemented, and others have not. At this point, a wealth of valuable brainstorming has already occurred. The cheatstorming paradigm proposes that no new ideas are necessary for further ideation to occur. Given a new prompt question and a set of 50 random previous ideas to draw from, cheatstorming simply bypasses the concept generation phase altogether and jumps directly to voting on which ideas to advance.

To test this concept we performed a simple pilot experiment. First, each member of our team generated 3-5 “totally random” brainstorm questions on Post-It notes, not in response to any particular question or stated need (e.g. “What is the easiest way to make the most people happy cheaply?”). Next, a set of

Figure : 2

60+ solution concepts was generated equally at random (e.g. “Magnetic cellphones”, “Non-linear presentation tool”, “Magic annoying elf that re-arranges your clothing,” etc.). Finally, one of the previously generated brainstorm questions was selected at random and paired with 10 of the concept Post-Its at random. From these 10 ideas, the four concepts that most closely resonated as solutions to the given question were selected as “winners.” We repeated this process four times with four different questions. For example, one of the sample solution pairings is shown in figure 2.

We were both surprised and delighted by the results of this method. Not only did we have little difficulty identifying those ideas that best resonated with the questions being asked, the resulting set of ideas was remarkably unexpected and fresh. Most exciting, the process was fast, fun, and required low effort, and the solutions revealed unexpected combinatory patterns and juxtapositions. In the first example shown in figure 2, for instance, the question asks “How could we illuminate large cities for less money to reduce nocturnal crime?” Surprisingly, three of the selected solution concepts are screen-based ideas that all emit light. Not only was this an unanticipated means of illumination, it was also one that could provide other forms of safety from nocturnal crime—via an interactive “call for help” kiosk or informative map, for example. Furthermore, the fourth idea in this set, “airbag for walking,” suggests that perhaps solutions for reducing nocturnal crime could be built directly into a user’s clothing. Combined with the other cheatstormed ideas, this in turn sparks a train of thought that perhaps clothing should be illuminated, or—alternatively—that the city’s streets should be padded. Finally, each of the other cheatstormed questions resulted in an equally compelling set of results. In response to the question “How could we reduce global warming effectively in the next five minutes?,” for example, “bio degradable vehicles” and “micro-financing” were among the selected concepts. While neither of these ideas may enable global warming to be reduced in the next five minutes alone, when combined together they indicate a potential direction for immediate action (i.e., green-vehicular crowdfunding).

Experimentation: Cheatstorming (Study 2)

There are many variables in the way that cheatstorming could be performed that we were curious to explore, such as how the variable effects of different types of “idea input” would affect cheatstorming results. We also wanted to compare cheatstorming results with results from a traditional brainstorming session. To this end, our next study leveraged the results of five previously completed brainstorming sessions from other unrelated projects as input. We chose this data from prior brainstorming sessions that had been well documented with clear questions and solutions, and which had generated more than 50 ideas apiece. These ideas had also been voted upon in the previous iteration, enabling us to track the success or failure of previously successful ideas in the new cheatstorming context. Finally, it was important for us that the brainstorming sessions had been performed by different groups of participants spanning a diverse set of HCI topics, to

Figure : 3

ensure that we had a wide variety of ideas in our pool to draw from overall, and so that unanticipated biases based on the authorship of ideas was reduced.

The prompts from the five selected sets of data were as follows: (1) “How could we summarize text-based information to make browsing it intuitive, useful, magical and fun?”, from a project on digital mind mapping; (2) “How could we sculpt and craft using digital tools?”, from a project on tangible computing; (3) “How could we encourage self actualization and the experience of new experimental dynamics?”, from a project on augmented reality; (4) “How could we support the successful publication of confident high quality writing?”, from a project on narrative fiction; and (5) “How could we rigorously craft and curate the design of

aesthetically pleasing narrative products and services?”,

also from the narrative fiction project. Our study design involved four experimental conditions drawing on brainstorming results from the above-mentioned sets of data. All of the previously generated raw ideas from each set of data were printed on cards in a unique color, one color per set (Figure 3). These raw-idea cards were used as input data for each of our study conditions. In addition, those idea cards that had been originally selected within each set as the “winners” for that set were clearly marked with an asterisk; this allowed us to trace which previously successful ideas prevailed through the cheatstorming process. The study conditions were designed to be structurally equivalent. In each case, 50 raw “input” ideas would be pared down to 10 “winning” ideas in response to the ideation prompt. We used the same ideation prompt across all conditions: question 5 (“How could we rigorously craft

and curate the design of aesthetically pleasing narrative products and services?”). The experimental conditions, illustrated in figure 4, were as follows:

Condition A (brainstorming baseline). Previously selected brainstorming results from set 5 (those with asterisks) were chosen automatically as de facto winners.

Condition B (overlapping diverse input). 17 ideas were each selected at random from sets 2, 3, and 4, combining to make a total of 51 ideas. One idea was removed at random, resulting in 50 ideas. Cheatstorming then commenced using question 5 as the ideation prompt. Because set 4 was drawn from the same project as set 5, cheatstorm results were anticipated to be most similar to condition A.

Condition C (unrelated diverse input). The same diverse

Figure 4. Experimental conditions for cheatstorming study 2.

input structure was used as in condition B, except input ideas were drawn from sets 1, 2, and 3. These ideas were not intentionally related to set 5 in any way.

Condition D (unrelated narrow input). This session used a single unrelated set of ideas as input, from set 1.

Cheatstorming proceeded by laying out all 50 input ideas for a given condition below the ideation prompt, then working through each of them one-by-one as a team, attempting to find ideas that would match with the brainstorming prompt (figure 5). Ideas that didn’t seem related were put aside. Remaining ideas were grouped together into 10 “winning” clusters, such that each cluster created a meaningful concept relevant to the prompt. Each cluster was then given a more concise and meaningful title so as to relevantly depict the newly synthesized idea (figure 6).

DISCUSSION

As described in detail by Isaksen [21], evaluating the effectiveness of group ideation outcomes is fraught with methodological and practical problems. These include the necessity of identifying and isolating the different factors in the ideation tool or process likely to influence its effectiveness, being aware of the level of training (if at all) the

Figure : 5

Figure : 6

facilitator had gone through to run the session, determining the group’s experience with creative ideation in general (and their orientation to the task at hand in particular), the preparation and presentation of the task in such a way that it promotes ideation, the effectiveness of the ideation method in highly-contextual real-world practice, and the criteria employed to evaluate the outcomes. Given these challenges, we believe it is difficult if not impossible to generalize the effectiveness of a specific culturally embedded creative activity without first recognizing the serious practical limitations of attempting to do so. For this reason, the approach taken in this study was design-oriented, in line with Fallman’s characterization of design-oriented HCI as giving form to previously non-existent artifacts to uncover new knowledge that could not be arrived at otherwise [12]. We attempted to replicate a controlled methodology as precisely as possible during our cheatstorming study four times, each time varying only the set of ideas input into the selection process. Each time, 50 previously defined ideas were reduced down to 10 “winning” favorites by the same team of researchers, and each time our 10 favorite ideas were unique. Given the creative and intentionally unpredictable nature of ideation, we believe that even though we held all of these variables constant (i.e. same team, same brainstorming rules, same prompt question, etc.) we would likely have generated differing ideation results if we attempted to repeat this study again. This stated, some noteworthy qualitative observations can be made, and we will now reflect on the qualitative differences in both the application of the process and the outcomes it produced between differing conditions of study 2.

Findings: Process

Cheatstorming was shown to be a fast and enjoyable means of creative ideation. Especially when cheatstorming ideas that came from different and diverse input sets, we find that this method works well as a mechanism of introducing novel concepts across creative cultures, a process akin to “technology brokering” among brainstorming teams whose ideas cross-pollinate [41]. Indeed, the greatest challenge and thrill of the cheatstorming method is being faced with the task of combining what often seem to be nonsensical results from previous brainstorming sessions—in that they contain remarkably little context by which to understand them—with ideation prompts that are likely to be equally without adequate context (especially should cheatstorming be widely deployed in a distributed setting). The natural reaction of the cheatstormer—indeed, their only real option—is to force an inventive connection between ideation and prompt. In this regard we posit that the more tightly constrained the input data given to the cheatstormer when synthesizing across large sets of data, within reason, the more effective they will be at identifying such juxtapositions. We note, for example, that our first study involved the reduction of 10 input ideas to four “winners,” which could be accomplished very quickly because the cheatstormer had no alternative than to pick something quickly that worked. Study 2, with its larger set of inputs and greater creative freedom, introduced a more overwhelming quantity of possible connections and, consequently, felt more tedious and less productive.

Based on this observation, we believe that setting time oriented constraints might help to improve the cheatstorming experience. While the mandatory rigor of matching 10 winning concepts per question in study 2 was nice, it also resulted in additional room for idea comparison and judgment, leading more nuanced but ultimately (we feel) less inspired ideas. Adding a time limit or other kinds of creative constraint might encourage spontaneous connections and force weaker ideas to be eliminated more quickly on a visceral basis.

Comparing the process across experimental conditions, it seemed both easier and more immediately intuitive to group together ideas that came from the same original source. Looking at the results of our final synthesis, however, we notice a distinctly integrated mixing of source material in the creation of our final generated concepts. This also highlights one of the possible biases that became evident as a result of our process. In retrospect we wished that we had not color-coded the input ideas, as it introduced a perceptible value judgment into the study. Indeed, the simple awareness that such a bias may have existed is likely to have resulted in our intentional or unintentional effort to use equal numbers of ideas of each color, for example. As a result, it is difficult to say if the approximately even survival rate of resulting ideas across input pools within each condition resulted from this bias.

Another source of bias were ideas that repeated themselves in subsequent iterations. This influence was twofold: foremost, since the prompt remained the same for all four cheatstorming iterations, arriving at similar ideas with each round became quickly redundant. Furthermore, because each cheatstorm used a random mix of input material, about a third of the ideas from previous conditions re-appeared with each subsequent effort. In this regard, we believe that ideas that have been previously “used” by participants should be removed from the input pool in successive rounds. Using digital systems, we anticipate the implications of scaling these methods up to large crowds of users, and recommend tracking previously viewed ideas to prevent them from appearing again. Not only did ideas seem “less interesting” on the second occasion, they also became harder to associate with new outcomes and meanings. Indeed, if creativity systems are to be tasked with delivering unconventional content to users it’s essential that the content should not be familiar.

Findings: Results

In addition to the cheatstorming team’s qualitative reflec tions on process, we consulted with an independent judge who had worked on the narrative fiction project to which the ideation prompt had originally belonged. Together we evaluated the top 10 “winning” idea clusters from each of the 4 conditions, to see if cheatstorming results would be applicable for potential real-world use on her project.

Relative to the baseline brainstorm condition (condition A), the most noticeable quality of the winning cheatstormed ideas was that all of them were dramatically technological in nature. This is not surprising, given that the narrative fiction project was the least technologically oriented of the prompts (the other three questions having been drawn from projects on augmented reality, tangible computing, and digital mind mapping, respectively). Furthermore, the degree to which ideas felt un-helpful to the project was directly proportional to their degree of strain. In condition A, the baseline condition, the ideas felt the most immediately useful and applicable to the project because they did not all have such a technology focus. We should also note, however, that our judge had originally been involved in selecting the baseline winners, but not the cheatstorming results, introducing a likely source of bias. Condition B, the overlapping diverse input group, was the most palatable set of the remaining ideas. It seemed to introduce fresh new ideas that were grounded in something familiar. Condition C, the unrelated diverse input group, was described as “the most random.” Ideas in this set—with names such as “tempo of experience control” and “real-time story-world generation”—were exciting but felt out-of-touch with project goals. Condition D, the unrelated diverse input group, were the most technologically immersive. Ideas such as “magic story wand” and “crowdsourced tangible narrative sculpting” were described as “nice to pursue if I had a team of designers and developers, but that would change the focus of what the project is really about.”

In summary, all of the ideas that resulted were related to the ideation prompt, but clearly reflected the spirit of the brainstorm from which they originated. This is not surprising, but it does indicate that a diverse mix of somewhat related (but also diverse and different) ideas could have a positive impact at broadening the scope and breadth of a project’s ideation.

CONCLUSIONS AND FUTURE WORK: CHAINSTORMING, TWEETSTORMING, AND CHEATSTORMING AT SCALE

This work has investigated distributed ideation from a design-driven perspective by designing and building prototypes of possible ideation mechanics and reflecting on the qualities of the outcomes. Our aim with this approach is to improve the design of HCI tools that facilitate efficient and effective group ideation.

Reflecting on our findings, we realize that we have revealed a model for group ideation with four distinct stages of progressive activity. Each stage carries with it a set of differing requirements and resulting behaviors, and we expect that the criteria leading to effective ideation outcomes at each stage will be different. These stages are: (1) prompting, the stage during which the ideation facilitator presents a challenge to the group that will drive ideation; (2) sharing, the stage in which participants suggest and communicate ideas within the context of the medium that frames the activity (i.e., orally, and/or using a whiteboard, sticky-notes, database system, and so on); (3) selecting, the phase during which participants vote and/or otherwise determine their favorite ideas; and (4) committing, the stage at which a final criterion is set to evaluate and prioritize ideas, ultimately determining which ones the team moves forward with and (ideally) develops.

This framing is in contrast to previous ideation models (e.g. Jones’ “divergence, transformation, convergence” model [22], Nijstad et al.’s dual pathway ideation model [32], etc.) in that, while it recognizes the cognitive distribution of ideation across social structures, it does not view creative behavior as a “generative” activity. Instead, ideas are simply transferred (or “shared”) between people, and the act of sharing is the source of the ideation: it involves the expression and interpretation of possible conceptual meanings. Even in traditional brain storming sessions, we propose, it is this communicative interplay between one person’s conception of an idea and another’s (mis)interpretation that results in the so-called “generation” of ideas. Cheatstorming demonstrates that ideas need not be created by the team for ideation to occur—they simply need to be interpreted as possibilities resulting from a collision of shared meanings. The only requirement for a successful ideation outcome is that the ideas introduced in the sharing stage are unconventional to the ideating individual, team, or culture [24] (i.e. “strange” [18]), and that they be interpreted as relevant (or not) to the ideation prompt.

We have introduced the concepts of cheatstorming as ideation without the “idea generation” component, and chainstorming more generally as a paradigm of communicative ideation (figure 1). Rather than conceiving of creativity as a spontaneous act of personal imagination, chainstorming is intrinsically social by nature. It is inspired by the “broken telephone” (or “Chinese whispers”) social group game, in which one person (Alice) secretly tells a story to another person (Bob), such that none of the other people present can hear it. In turn, Bob tells the story as he remembers it to a third person (Carol), and so on, until all of the people in a continuous chain back to Alice are reached. The last person to hear the story shares what he or she remembers with the entire group, and that story is compared with the original story.

In chainstorming, much like this game, each participant is asked to build on the story of the previous participant in the chain. The first person in the chain generates the prompt question and one or two ideas that respond to the question before sending it off to a network of friends. Each subsequent person sees the prompt question, along with a subset of the ideas from the previous participant, and uses these ideas to build on them and generate new ideas. Using this method, which introduces a degree of randomness at each stage and which can also be controlled by the design of the communication and its rules, we propose that collective creativity can be embedded in social networks through simple interactions that reduce cognitive effort. Indeed, similar approaches have been developed in recent related work, promising the development of evolutionary creativity algorithms wherein humans pick the “fittest” ideas to result in emergent solutions to potentially complex tasks [45]. In chainstorming, where a random subset of each participant’s previous ideas could be selected and passed along with each interaction, the continued juxtaposition and “constructive strain” [18] from potentially unrelated or even contradictory ideas could consistently spark unexpected new socially generated concepts. Indeed, it is the unique ability of cheatstorming to “dial in strangeness,” as explored in our study, that makes it such a compelling example of the future of ideation online. In the case of cheatstorming, this is far more nuanced than existing methods of random input, such as future workshops [31], inspiration card workshops [19], or other similar methods for lateral thinking, in that it enables operational changes to the ideation methodology and content directly, and thus can facilitate targeted and highly contextual “leaps” from an original set of ideas to a much wider framing of the problem domain.

Clearly the success of chainstorming as paradigm depends largely on details of its implementation since, as noted in our discussion of brainstorming best practices, several factors will greatly influence the most effective outcomes. Much like offline group brainstorming, effective chainstorming is likely to depend heavily on the social constitution of the chain, the level of training (if any) that participants receive, the group’s experience and orientation with the task at hand, and the criteria employed to evaluate its outcomes. Moreover, ideation of this nature introduces additional factors that will need to be addressed—especially the potential lack of context accompanying the prompt communication, which (and how many) prior concepts accompany the message as it is passed from user to user, how is their selection determined, as well as how to handle redundant concepts, dead ends, cross-posting and parallel chains, and so on. Indeed, these are complicated issues that underlie all social messaging and communication networks.

In order to investigate these questions of ideation more deeply, and identify best practices for chainstorming networks, we have begun the design and development of a new social media platform for ideation—Tweetstormer— which will leverage Twitter messages as the transactional medium of the chainstorming system. Using this platform, members of the online community will be able to post and respond to tweeted prompt questions to virally distribute the chainstorm. Not only will this enable Twitter users to ideate anytime from anywhere using their computer or mobile device, our plan is to implement a custom website that allows users to see other users’ questions, reply to them selectively, browse other users’ replies to prompts, and vote on their favorite ideas to select them. Our hope is that ideation via this and other similarly inspired platforms will enable a more nuanced empirical study of the chain storming paradigm and how best to integrate it effectively into the social fabric of online innovation.

REFERENCES

  1. Barki, H., Pinsonneault, A. (2001). Small Group Brain- storming and Idea Quality: Is Electronic Brainstorming the Most Effective Approach?, Small Group Research, 32, 158.
  2. Bayless, O.L. (1967). An alternative for problem solving discussion, Journal of Communication, 17, 188-197.
  3. Bouchard, T. J., & Hare, M. (1970). Size, performance, and potential in brainstorming groups, Applied Psych, 54, 51-55.
  4. Bouchard, T. J., Barsaloux, J., Drauden, G. (1974). Brainstorming procedure, group size, and sex as deter-

minants of the problem-solving effectiveness of groups and individuals, Applied Psychology, 59(2), 135-138.

  1. Collaros, P. a, & Anderson, L. R. (1969). Effect of perceived expertness upon creativity of members of brainstorming groups, Applied Psychology, 53(2), 159-163.
  2. de Bono, E. (1970). Lateral Thinking: Creativity Step By Step, Harper Perennial.
  3. DeRosa, D. M., Smith, C. L., & Hantula, D. A. (2007). The Medium Matters: Mining the long-promised merit of group interaction in creative idea generation tasks in a meta- analysis of the electronic group brainstorming literature, Computers in Human Behavior, 23, 1549-1581.
  4. Diehl, M., & Stroebe, W. (1987). Productivity Loss in Brainstorming Groups: Toward the Solution of a Riddle, Personality & Social Psychology, 53(3), 497-509.
  5. Djajadiningrat, J.P., Gaver, W.W. & Frens, J.W. (2000). Interaction relabelling and extreme characters: Methods for exploring aesthetic interactions. Proc. DIS 2000, 66-71.
  6. Dunnette, M. D., Campbell, J., Jaastad, K. (1963). The effect
  7. Eno, B. (1978). Oblique Strategies. Opal, London.
  8. Fallman, D. (2003). Design-Oriented Human-Computer Interaction. Proc. CHI, 225-232
  9. Faste, R. (1993). An Improved Model for Understanding Creativity and Convention, in Cary A. Fisher (ed.), ASME Resource Guide to Innovation in Engineering Design, American Society of Mechanical Engineers.
  10. Faste, R. (1995). A Visual Essay on Invention and Innovation,” Design Management Journal, 6(2).
  11. Faste, H. & Bergamasco, M. (2009). A Strategic Map for High-Impact Virtual Experience Design, Proc. SPIE, 7238
  12. Gallupe, R., Bastianutti, L. M., & Cooper, W. H. (1991). Unblocking brainstorms. Journal of Applied Psychology, 76(1), 137-142.
  13. Graham, C., Rouncefield, M., Gibbs, M., Vetere, F., and Cheverst, C. (2007). How Probes Work Proc. of OzCHI, 29
  14. Gordon, W. J. J. (1971), The Metaphorical Way of Learning and Knowing, Porpoise Books, p. 20.
  15. Halskov, K., Dalsgård, P. (2006). Inspiration Card Workshops, Proc. Designing Interactive Systems, 2-11.
  16. Hegedus, D. M., (1986). Task Effectiveness and Interaction Process of a Modified Nominal Group Technique in Solving an Evaluation Problem, J. of Management, 12(4), 545-560.
  17. Isaksen, S. G. (1998). A Review of Brainstorming Research: Six Critical Issues for Inquiry, Technical report, Creative Problem Solving Group, Buffalo, NY.
  18. Jones, J. C. (1970). Design Methods, John Wiley & Sons.
  19. Kelley, T., Littman, J. and Peters, T. (2001). The Art of Innovation: Lessons in Creativity from IDEO, America's Leading Design Firm, Crown Business.
  20. Koestler, Arthur, The Act of Creation, Dell, NY, 1964
  21. Lamm, H. and Trommsdorff, G. (1973). Group versus individual performance on tasks requiring ideational

proficiency (brainstorming): A review. European Journal of Social Psychology, 3, 361–388.

  1. Larry, T., Paulus, P. (1995). Social Comparison and Goal Setting in Brainstorming Groups, Journal of Applied Social Psychology, 25(18), 1579-1596.
  2. Lehrer, J. (2012). Groupthink: The brainstorming myth, The New Yorker, January 30
  3. Madsen, D. B., & Finger, J. R. Jr., (1978). Comparison of a written feedback procedure, group brainstorming, and in- dividual brainstorming, Applied Psychology, 63(1), 120-123.
  4. Maginn, B. K., & Harris, R. J. (1980). Effects of anticipated evaluation on individual brainstorming performance, Journal
  5. Mullen, B., Johnson, C. (1991). Productivity Loss in Brainstorming Groups: A Meta-Analytic Integration, Basic and Applied Social Psychology, 12(1), 3-23.
  6. Muller, M.J., White, E.A., and Wildman, D.M. (1993). Taxonomy of PD practices: A brief practitioner’s guide. Communications of the ACM, 36(6), 26-28 (June 1993).
  7. Nijstad, B. A., De Dreu, C. K. W., Rietzschel, E. F., & Baas, M. (2010). The dual pathway to creativity model: Creative ideation as a function of flexibility and persistence. European Review of Social Psychology, 21, 34–77.
  8. Osborn, A. F. (1963). Applied Imagination: Principles and procedures of creative thinking (3rd edition), Scribner.
  9. Paulus, P. (2000). Groups, Teams and Creativity: The Creative Potential of Idea-Generating Groups, Applied Psychology: An International Review, 49(2), 237-262.
  10. Price, K. (1985). Problem Solving Strategies: A Comparison by Problem-Solving Phases, Group and Organization Studies, 10(3), 278-299.
  11. Renzulli, J. S., Owen, S. V., & Callahan, C. M. (1974). Fluency, flexibility, and originality as a function of group size, Journal of Creative Behavior, 8(2), 107-113.
  12. Searle J., R. (1983). Intentionality: An Essay in the Philosophy of Mind, Cambridge University Press, 1983.
  13. Shah, H. H. & Vargas-Hernandez, N. (2002). Metrics for Measuring Ideation Effectiveness, Design Studies, 24(2).
  14. Stein, M. I. (1975). Stimulating Creativity: Group Procedures (volume 2), Academic Press, NY
  15. Stenmark, D. (2001). The Mindpool Hybrid: Theorising a New Angle on EBS and Suggestion Systems, Proc. Hawaii International Conference on Systems Science.
  16. Sutton, R., & Hargadon A. (1996). Brainstorming Groups in Context: Effectiveness in a Product Design Firm, Administrative Science Quarterly, 41(4), 685-718.
  17. Taylor, D. W., Berry, P. C., Block, C. H. (1958). Does group participation when using brainstorming facilitate or inhibit cre- ative thinking? Administrative Science Quarterly, 3(1), 23-47.
  18. von Oech, R. (1986). A Kick in the Seat of the Pants, Harper.
  19. Watson, W., Michaelsen, L. K. & Sharp, W. (1991). Member competence, group interaction, and group decision making: A longitudinal study. Applied Psychology, 76, 803-809.
  20. Yu, L., & Nickerson, J. (2011). Cooks or Cobblers? Crowd Creativity through Combination, Proc. CHI, 1393-1402.

Res Eng Design DOI 10.1007/s00163-008-0055-0

Observations on concept generation and sketching in engineering design

Maria C. Yang

Received: 5 September 2005 / Revised: 14 December 2006 / Accepted: 3 June 2008

Springer-Verlag London Limited 2008

Abstract The generation of ideas is an essential element in the process of design. One suggested approach to improving the quality of ideas is through increasing their quantity. In this study, concept generation is examined via brainstorming, morphology charts and sketching. Statisti cally significant correlations were found between the quantity of brainstormed ideas and design outcome. In some, but not all, experiments, correlations were found between the quantity of morphological alternatives and design outcome. This discrepancy between study results hints at the role of project length and difficulty in design. The volume of dimensioned drawings generated during the early-to-middle phases of design were found to correlate with design outcome, suggesting the importance of concrete sketching, timing and milestones in the design process.

Keywords Concept generation Sketching Design process

1 Introduction

The generation of ideas is a key activity in the design process. This study focuses on one idea generation method known as brainstorming (Osborn 1963) which has been widely adopted in product design and development in industry (Sutton and Hargadon 1996; Kelley and Littman 2001). Brainstorming consists of a set of broad, process

M. C. Yang (&) Department of Mechanical Engineering and Engineering Systems Division, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Room 3-449B, Cambridge, MA 02139, USA e-mail: mcyang@mit.edu

driven guidelines that can be applied in many contexts. However, Shah et al. (2000) points out that the results of brainstorming can be ‘‘unpredictable.’’ This study focuses on measuring a single aspect of the brainstorming process through one of its main guidelines: generate as many ideas as possible. By broadening the initial pool of ideas, the assumption is ‘‘quantity yields quality.’’

Osborn’s rules have previously been examined to understand the role of social situations in creativity (Diehl and Stroebe 1991; Paulus et al. 1995; Paulus 2000). These experiments consider brainstorming through creativity exercises of brief duration and scope using subjects who generally do not have experience in design. This approach is valuable in that it limits confounding factors in the analysis of the creative process. This paper considers concept generation from another perspective, that of engineering design in multi-week long projects involving engineering design students.

This paper examines three hypotheses related to concept generation in design and builds on preliminary research by the author (Yang 2003). These hypotheses were formulated in the context of engineering design courses, and are intended to help design practitioners improve their crea tivity and design outcomes as well as assist educators in teaching design:

The quantity of concepts generated at the beginning of a design project correlates with design outcome.

Hypothesis 1

It has been suggested that the generation of more ideas initially leads to a higher incidence of better quality ideas. It may be further surmised that having a higher quality idea in the early stages of design process may be more condu cive to a better final design outcome. Do individuals who generate more ideas also have better final design projects?

This paper examines this question through the quantity of ideas generated by brainstorming. It also considers concept generation at a lower level through morphology charts which are hierarchies of alternative embodiments that can fulfill each of a device’s functions (Zwicky 1969).

Hypothesis 2

The quantity of sketches generated during

a project correlates with its design outcome.

Sketching is an integral part of the engineering design process. Sketches are representations of a design, but research suggests that the process itself of sketching is a fundamental element of design thinking (McKim 1980; Ullman et al. 1990; Schön and Wiggins 1992; Nagai and Noguchi 2003). In particular, the act of sketching is thought to be critical to generating concepts (Goldschmidt 1994; Goel 1995; Suwa and Tversky 1997; Purcell and Gero 1998; Verstijnen et al. 1998). Tovey et al. (2003) refer to sketching as a ‘‘language for handling design ideas.’’ The studies of Romer et al. (2000) posit that sketching is useful as a way to mentally offload concepts during complex design activity.

Observations of industrial practice suggest that design success is closely linked to realizing an idea through drawing and prototyping (Schrage and Peters 1999). Other research has assessed sketching to determine possible links to design outcome. Schütze et al. (2003) found that designers who were allowed to sketch during the design process produced a higher quality solution than those who were not permitted to. However, Bilda et al. (2006) suggest that sketching is not essential for design, although the more basic activity of mental imagery may be still be critical. Song and Agogino (2004) found significant cor relations between sketch volume and design outcome in a product development course. Yang and Cham (2007) found no links between drawing skill and design outcome, but did show that skilled sketchers tended to draw more overall.

This paper investigates the quantity of design sketching as another way to infer concept generation volume. Sket ches are divided into two categories: dimensioned drawings and non-dimensioned drawings. Dimensioned drawings are of interest because they may represent ideas that are further along in the design cycle and are therefore more developed than non-dimensioned ones.

Increased sketching at the beginning of the project, rather than at the end, correlates with better design outcome.

Hypothesis 3

Designers can, and do, form ideas throughout the design cycle. Concepts may occur under formal circumstances (for example, to meet a milestone) or more spontaneously, as a classic ‘‘eureka’’ moment. However, concurrent engineer ing holds that design decisions made in the early, conceptual phases of design have greater impact on overall design than those made later on (Winner et al. 1988). This study observes whether sketches generated earlier or later on in the design cycle have different correlations with design outcome.

2.1 Measures of creativity

Idea quantity, or fluency, is but one of several measures for creativity. Other metrics for concept generation include flexibility, or the range of ideas, and originality, or novelty of idea (Guilford 1959). Shah et al. (2003) define formal metrics including novelty, variety, quality, and quantity which have been adopted by other researchers (Song and Agogino 2004; Chusilp and Jin 2006). This study focuses on primarily on quantity in part due to its relative objectivity as a measure across a range of idea representations such as sketches and textual lists of ideas.

2.2 Types of sketches

Sketches may be considered from several perspectives. Ferguson (1992) classifies three types of sketches in terms of their intended purpose: the thinking sketch serves as a reflective medium, the prescriptive sketch acts as a blue print for design work, and the talking sketch supports design collaboration.

Sketches may be categorized by the design progression they represent. van der Lugt (2005) examines sketches as a mechanism for reinterpretation of an individual designer’s ideas. Goel (1995) defines ‘‘lateral transformations’’ as incremental changes that build on a previous idea, while ‘‘vertical’’ ones result in a more detailed version of an earlier sketch. McGown et al. (1998) and Rodgers et al. (2000) label categories based on the physical elements of engineering sketches:

• Level 1: A simple monochrome line drawing that does

not include shading or annotations

3 Methods

3.1 Test bed

The test beds for these experiments were two project based, undergraduate mechanical engineering design courses at the California Institute of Technology. In all three courses, teams were provided with identical sets of materials from which to prototype.

The first course was in advanced engineering design. Data was collected from two separate years of this course. The advanced course had twenty-four students in one year (‘‘Course 1’’) and twenty-three in the next (‘‘Course 2’’). In both, teams of two students were presented with an open-ended, ill-defined design challenge such as a robotic capture-the-flag game. They were required to develop conceptual designs and fabricate a fully functional electro mechanical device in the engineering machine shop. Potential design solutions could range from simple three wheeled cars to robotic arms to combinations of custom mechanisms. The breadth of design scope provided ample opportunity for sketching many aspects of various design concepts. At the end of the ten-week course, teams com peted against each other in a double elimination contest held before the entire campus. A single winner emerged. Example projects are shown in Fig. 1, along with a rep resentative sketch from the corresponding logbooks. Note that though students worked and competed in pairs, they were expected to produce their own separate devices that were assessed independently from their teammate’s.

Fig. 1

The second course was an introduction to design com prised of thirty-three students (‘‘Course 3’’) which was a pre-requisite to the advanced course. This course included a three-week long, open-ended design challenge in which students were asked to design and build a solution using ‘‘soft’’ materials, such as foamcore, to pop a helium-filled balloon suspended over a large water pond. Students worked in teams of three to five and were graded both as a team and as individuals. Sample projects are shown in Fig. 2, along with a drawing from the associated logbooks. As in the other course, design solutions could span a wide range of possibilities including hooking and grasping mechanisms and drivable boats. The range of potential solutions allowed a variety of sketches and ideas to be developed over the duration of the projects.

3.2 Design data

3.2.1 Brainstorming and morphology charts

Concept generation was examined through brainstorming and morphology charts. Brainstorming took place in the first third of the project for the Introductory Course. Each design team was presented with the design problem and then asked to generate and write down conceptual design alternatives in class and on their own over a span of a week. The number of brainstormed ideas generated by each students was counted.

Morphology charts were created in all three courses. Morphology charts require a more systematic, lower level

Fig. 2

enumeration of concepts than brainstorming. Early in the projects for all three courses, each team developed a chart of the desired functions of their device, along with possible approaches to achieving those functions. For example, if the desired function was to ‘‘block an opponent,’’ ways to achieve that goal might be to ‘‘drive into the opponent’’ or ‘‘push the opponent.’’ Most of these embodiments were illustrated with thumbnail diagrams, and all included brief text descriptions. The total number of morphological embodiments was counted.

3.2.2 Sketching and logbooks

Paper-based design logbooks were kept by each student over the life of the project. Logbooks can be a compre hensive record of a student’s design process and thinking. The information archived in the logbooks varied widely in form, including detailed descriptions of work, plans for fabrication, engineering calculations, and sketches of many types and levels of detail.

Individual drawings in each logbook were counted and indexed by date and by whether it was dimensioned or not. Drawings were considered dimensioned if they included numeric labels for parameters such as length, width or diameter. Such drawings may be interpreted as a step towards making a design more concrete.

As with any informal information, there is ambiguity as to what constitutes an individual drawing, and the goal was to be consistent between logbooks. In nearly all cases, however, distinct objects were obvious because of white space separation or clear annotations (such as indicating text or arrows) in the sketch. Ullman et al. (1990) uses ‘‘marks on paper’’ to refer to both sketches and annotation in order to capture the ambiguity inherent in defining design sketching. The quality of individual sketches was not considered, in part because other work suggests that sketch quality is not linked to design outcome (Yang and Cham 2007).

This study did not normalize for a student’s drawing ability or previous drawing experience. None of the stu dents were given any sketch instruction. In Yang (2003), a survey was taken of one of the advanced courses (Course 1) to gauge the level of other relevant design experience that students might possess. The results of the survey showed that students generally felt they had above-average experience in engineering analysis, engineering intuition, engineering fabrication and arts and crafts. Students also self-reported below-average experience in drawing, tin kering, and construction.

Table 1

Project length (weeks)

Design data collected Brainstorming

Morphology

Sketch

1 (Adv) 2 (Adv) 3 (Intro)

10 10 3

9

9

9

9

9

9

Table 1 summarizes the types of design data collected in each course.

3.3 Design outcome evaluation

Two indicators of design outcome were employed. The first was the individual final grade, counted in points, for each student. The maximum number of points possible was 100, and was based on overall performance over the 10-week period. For the introductory class (Course 3), project grade was also considered. Note that the final grade includes stu dent effort for two other projects unrelated to the first project.

The second indicator, applied only to the advanced courses, was each team’s final ranking in the contest. Contest performance was based on the number of rounds of competition that the team was able to win. The more rounds won, the higher the rank. Contest performance was decou pled from the final grade itself. It was possible for a team to perform poorly in the contest but earn a good grade, and vice versa. One reason for this is that the primary goal of the class is to teach engineering design and design process, and a student could demonstrate an excellent grasp of design process but still rank low in the contest.

4 Results and discussion

4.1 Type of sketching

A total of 4,008 sketches were counted in the logbooks. 61.4% included dimensions and the remaining 38.6% did

Fig 3

right

not. The dimensioned sketches were generally considered ‘‘prescriptive’’ sketches in that many of them were likely meant as blueprints for fabrication. The remaining sketches were virtually all ‘‘thinking’’ sketches (Fig. 3). In these courses, logbooks were primarily a device for capturing the thinking of the individual designer, and this function was borne out in the sketches found in the logbooks. This is also consistent with the findings of Song and Agogino (2004).

Although the advanced course students were asked to document their design process in paper logbooks, students were not prohibited from using other media for visualizing their ideas, and access to CAD tools was freely available to all students. In one of the advanced courses, one out of the twenty-four students used CAD tools to create both dimensioned and non-dimensioned drawings. In the fol lowing year, six of twenty-three students employed CAD, although it is not clear why more students chose to do so. In this study, CAD drawings were treated in the same manner as hand sketches under the assumption that they are still representations of design thought. Also, at the time Ferguson wrote his book, Computer-Aided Design (CAD) tools were not as ubiquitous as they are today, and it would be reasonable to think many engineering designers create ‘‘prescriptive’’ sketches for themselves rather than a third party to formalize.

There are some challenges in judging sketches post facto for two reasons. First, without explicitly asking the designer’s intended purpose with each sketch, the observer can only guess what the intention is. Second, a sketch may serve multiple purposes. For example, though many of the logbook sketches could be characterized as ‘‘thinking’’

sketches, their design work was performed in the context of the classroom, so it would be reasonable for students to expect that others (teammates or teaching staff) might refer to their sketches in the way they would a ‘‘talking’’ sketch. Evidence of this is seen in the way some sketches were annotated in first person narrative.

Although in most situations it was clear what mechan ical component a particular drawing represented (for example, a wheel or a gear), the content of each sketch was not tracked. Since many of these logbook sketches were ‘‘thinking’’ sketches, it is difficult to interpret what the intention of the sketch was. A jumble of lines may be meaningless to the external observer, but hold important meaning for the sketcher. The quality of idea represented in a sketch was likewise not tracked.

Almost all drawings found in the logbooks were Levels 1 and 2, and occasionally Level 3, meaning that they were primarily line drawings with limited annotation and sometimes shading. It is important to note here that the levels of detail outlined by McGown et al. (1998) were based on observations of engineering students who had gone through art and industrial design training as part of their core curriculum. In comparison, the designers observed in this study are relative novices in sketching.

4.2 Concept quantity and design outcome

The quantity of concepts generated at the beginning of a design project correlates with design outcome.

Hypothesis 1

Table 2 shows that the quantity of ideas brainstormed had a statistically significant Spearman correlation (Rs) with both the three-week project grade and the overall final grade for the term.

Table 3 shows correlations between the quantity of morphological alternatives with both final grade and con test performance for Courses 1 and 2, and with grade only for Course 3. The number of morphological alternatives includes embodiments from all hierarchy levels. Note that for Course 1, N = 20 rather than 24 because of data unavailability, so for a significance level a = 0.05, Rs must be greater than 0.377. Course 2 had even less data available for analysis (N = 12). Course 3 includes two design

Table 2

Correlation coefficient, Rs Project grade

Final grade

Total ideas brainstormed

0.48

0.33

N = 33, Rs = 0.291 for a = 0.05

Table 3

N

Rs

Grade

Contest

Grade Contest
Course 1 Course 2 20 0.377 0.503 12 0.160 0.19 0.020 -0.07
Course 3 33 0.291 0.61 (project) N/A 0.34 (final) N/A

outcome measures—the final grade for the course, and the grade for the three-week project only.

For Courses 1 and 2, the total morphological alternatives do not correlate positively with either grade or contest performance. In contrast, the introductory course (Course 3), correlated statistically significantly with both final and project grade.

Possible explanations for this difference between the two courses could include sample size, length of project, and level of prototyping skill required for the project. Many issues can arise during the development and fabri cation that are not considered or are unforeseeable in the early, conceptual stage. In the introductory course, students need only have rudimentary soft prototyping skills, and the project is only three weeks long. The likelihood that their design will change is lower than for the advanced courses which require some level of skill and training in the machine shop and whose project lasts three times as long. The response to this hypothesis then is a ‘‘maybe.’’ In this case of a shorter, less involved design project such as found in the introductory course, this hypothesis holds true. However, in the more involved, longer duration project such as found in the advanced courses, this hypothesis was not found to be the case.

It should be pointed out that this finding does not nec essarily mean that the number of ideas is a cause of project success, only that the two variables tend to increase and decrease together. Other factors that may also be linked to the final outcome of design will be discussed in Sect. 5.

4.3 Sketch quantity and design outcome

Hypothesis 2

The quantity of sketches generated during

a project correlates with its design outcome.

No statistically significant correlations were found between the total quantity of either dimensioned or total sketches of any type and final grade or contest ranking for the three courses. Overall, prolific sketchers were no more likely to have good design outcomes than those who did not sketch as much. This diverges from the findings of Song and Agogino (2004), and it is theorized that this is due to the differing natures of the design projects. The projects studied by Song were primarily product design projects with an emphasis on market studies. In contrast, Courses 1 and 2 focus on engineering design, with an emphasis on physical prototyping. This phenomenon is examined in closer detail in the following section.

This paper’s findings may be further considered in light of the work of van der Lugt (2005) who determined that idea generation was not related to better design outcomes unless designers ‘‘worked’’ the sketch through re-inter pretive cycles. In this paper, the re-interpretation of sketches was not tracked, but may merit further examina tion in future work.

4.4 Sketching over time and design outcome

Increased sketching at the beginning of the project, rather than at the end, correlates with better design outcome.

Hypothesis 3

Figures 4 and 5 show the average number of sketches plotted by time in the advanced courses. The lower, darker section of each column represents the dimensioned draw ings, and the entire column shows the total number of drawings (dimensioned and non-dimensioned).

The general trend in both courses is the same—fewer overall drawings in the beginning, more in the middle, and a drop off at the end of the project. There are also a pro portionately greater number of total drawings during the first part of the term as compared to the remainder of the term. It is observed that the proportion of dimensioned sketches to the total number of drawings starts off low and increases towards the middle of the project. Taken toge ther, these trends may indicate abstract conceptual design

35

25

15

20

5

10

30 Total

0

1 2 3 4 5 6 7 8 9

Week

Fig. 4

35

30

25

20

15

10

5

0

1 2 3 4 5 6 7 8 9

Week

Fig. 5

activity early in the design cycle and more prototyping in the later stages of the design cycle.

There are differences, however, in when the peak number of drawings/sketches occur. We see that for the first advanced class (Course 1), the peak starts almost immedi ately in the second week. For the other advanced class (Course 2), the peak is not reached until the fifth week. A possible explanation might be due to the design milestones in which students present their work to the teaching staff, typically in the form of drawings and prototypes. Mile stones serve as a project management guideline for the students, as well as a way for teaching staff to monitor progress. Interestingly, the timing of design milestones within the project is nearly identical in both courses.

Figures 6 and 7 list the Spearman correlation coeffi cients (Rs) for the average total non-dimensioned and dimensioned drawings per week correlated with final grades and contest performance each week of the term. Dotted horizontal lines show the threshold for statistical significance.

Figure 6 shows that dimensioned drawings and final grades are correlated in a statistically significant way dur ing the first three weeks of the design cycle. The total number of drawings is also significantly correlated with grade during the third week, and in the first week with contest rank. This result is notable because it shows that the creation of more concrete dimensioned drawings early in the design cycle has a positive correlation with design outcome.

The trend in Fig. 7 is somewhat different. Dimensioned drawings correspond in a statistically significant way in the third and fourth weeks and total drawings correlate with contest performance only in the fifth week. In this case, dimensioned drawings are still important, but not until

0.8

0.6

0.4

0.2

0

Time (weeks)

Fig. 6

0.6 0.5 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 -0.4

Time (weeks)

Fig. 7

slightly later in the early-to-middle phases of the design cycle.

In both cases, the correspondences are roughly coinci dent with each course’s peak sketch volume in Figs. 6 and 7. These peaks occur around the periodic design reviews in which students must present their project work to the teaching staff for feedback. Average student sketching activity is at its highest when it is most strongly correlated with design grade. One possible explanation is the nature of design performance. Both final grades and contest ranking are somewhat relative measures. While it is possible that all students in a course can earn an ‘‘A,’’ it is more likely that students will receive grades that fall along a distribu tion relative to each other. At the same time, the contest itself is intended to produce relative rankings, with clear standings. It makes sense, then, that the relative efforts of a student during peak design activity might be associated with design outcome.

Interestingly, the correlations with grades tend to become more negative over time. Correlations with design activity early in the design cycle are more positive, and later in the design cycle are more negative, which suggests that last minute efforts at sketching are not consistent with a good outcome.

In these cases, the proposed hypothesis is shown to be true. In addition, the role of milestones in planning design work may be important, and falling behind in design is linked to poorer outcomes.

5 Conclusions

5.1 Concept generation and sketching

Concept generation measured in the form of morphology charts showed a statistically significant correlation with both project and final term grade in the introductory course. However, morphology charts in the advanced courses did not show a statistically significant correlation. This may be due in part to the inherent differences of methods to gen erate concepts. Strictly speaking, brainstorming is meant to generate a wide range of concepts, while morphology charts are only intended to enumerate mechanisms for achieving specific functions. The key difference in these two courses is the timing of concept generation and design outcome. A shorter elapsed project time combined with less emphasis on detailed fabrication in the introductory class suggests that the role of idea quantity is less important the further along in the design cycle outcome is measured.

One of the unexpected findings of this study was the role of timing in concept generation and sketching. Data from the advanced courses shows that dimensioned sketch quantity is significantly correlated to design grades in the first half of the design cycle.

Total sketching volume measured at the end of the term, including both dimensioned and non-dimensioned draw ings, did not correlate with graded design outcome or contest performance. This suggests that it is possible to produce copious drawings during the design process and still have an unsuccessful graded design. Likewise, it is implied that a designer can sketch very little overall and achieve a better design grade, as long as the bulk of dimensioned drawings (and perhaps prototyping) are cre ated early on in the design process. This suggests that starting work early on a project is beneficial.

It should be noted that there can be a range of approa ches to maintaining a logbook. Design logbooks capture a wide variety of information in a flexible way. Many stu dents detail their design activity meticulously. For others, however, logbooks may not always be a complete record of design thinking. This may be due in part to the overhead involved with keeping a logbook. As one student put it, ‘‘I do design in my head’’ rather than write thoughts down.

5.2 Design outcome

Two design outcomes were used in this study: final grade and contest results. Only total sketching volume correlated significantly with contest results. This is interesting because in some ways, the contest simulates the conditions of real world design in that the situations in which a product is released are unpredictable, unlike a classroom setting that is controlled and ‘‘safe.’’ While students were provided with as realistic a testing environment as possible throughout the term, the actual contest included technical issues that could not be replicated beforehand, such as electrical interference, as well as non-technical issues such as the stress experienced by contestants competing in front of their peers. However, unlike the real world, a contest is an artificial construct. In real-world design, there is rarely an absolute notion of winning or losing, and the definition of success is open to interpretation.

The final design grade in these cases is comprehensive and assesses the overall design process. In industry, a product can be developed following a ‘‘good’’ process but the result can still be thought of as unsuccessful. Contests, on the other hand, focus on end products instead of process. In either case, the development of appropriate design out come measures that are viable for evaluation is an open area of design research.

5.3 Sketching and prototyping

Dimensioned drawings are a logical precursor to proto typing. These results suggests that prototyping earlier in the design cycle rather than later is linked to better design grades, perhaps because starting fabrication earlier leaves more time for iteration and refinement of a design.

In some ways, the correlation between early dimen sioned sketching is somewhat counter-intuitive. The rules of brainstorming hold that designers should withhold judgment on a set of ideas before selecting one for further development and manufacturing. Concurrent engineering as well as system engineering posit that more time spent up front to understand the potential implications of design decisions downstream is beneficial to design. However, the results of this study suggest that selecting a design path and prototyping earlier on is preferable. This may have to do with the fact that the course being studied has a strong emphasis on prototyping a fully functional design. In fact, Yang (2003) found that the students who self-reported having prior engineering building experience and engi neering intuition that correlated significantly with design outcome. The suggestion that earlier prototyping is con sistent with better design results aligns with the findings of Ward et al. (1995). Ward examined the highly successful design and manufacturing process at Toyota Motor Com pany. Toyota’s counter-intuitive approach at the time was to prototype many design alternatives before selecting one for full production.

5.4 Limitations

It should be emphasized that all three of the hypotheses included in this paper rely on the notion of correlation rather than causation. The products of the earliest stages of the design process, such as concepts and sketches, are not examined as the sole causes of a design outcome, but as co occurrences with design outcome that may merit further study. One of the challenges is that numerous steps are involved in design beyond the formulation stage, including fabrication and testing, that may have little to do with the initial ideas generated. Furthermore, there may be inter actions among these steps that introduce additional confounding factors.

At the same time, it should be pointed out that those designers who generate greater quantities of ideas or pro duce more sketches or start prototyping their projects earlier may possess characteristics that will facilitate stronger design outcomes. These designers may be more motivated, more creative, and perhaps better able to address the challenges of creating a more successful design, and thus produce a better design outcome.

6 Future work

This study draws on guidelines from design practice to correlate concept quantity and sketching with design out come. However, this research is one of many efforts to better understand sketching and concept generation in engineering design. Future work will consider specific aspects of sketches that reflect cognitive elements of con cept generation and sketching as found in Shah et al. (2000, 2001, 2003). In addition, there are a number of other types of design activity that merit study to understand their possible links to design. Song and Agogino (2004) point out the importance of considering other modes of design thinking, in particular verbal/textual cognition. Indeed, one of the best performing designers in the advanced class did little or no drawing, but meticulously documented his design activity in textual form.

A potential impact of this work will be derived from the development of relevant design tools to facilitate design information handling in design teams, both in the class room and in industry practice. What are ways to formulate ideation methods and sketching guidelines so that they can help students become more effective designers?

The role of the design logbook as a design tool should also be examined. This work focuses both on individual and team design efforts, and in real-world design situa tions, products are often the end result of collaborative team work. Paper design logbooks such as used in these courses have long been employed to capture design knowledge, but electronic versions of these hold promise as tools to facilitate design thinking and process. There is relevant research in the area of electronic design notebooks (Lakin et al. 1989; Hong et al. 1994; Viste and Cannon 1995) that specifically supports design team collaboration. There exists a related body of research in textual design information analysis (Baya and Leifer 1994; Dong and Agogino 1997; Wood et al. 1998) as well as in electronic sketch capture (Yen et al. 1999) that may help design tools to be more intelligent and designers to be better designers. However, much work remains to be done in the research and development of design tools such as these. One current stumbling block is that such computational tools do not yet provide the same level of usability and critical affordances that simple paper and pen provide (Alvarado and Davis 2001). Some ideal features include portability, ease of sketching, and seamless integration with text tools. In addition, these tools need to allow ease of annotation of design sketches with rough dimensions and notes. More formal methods for visualization, such as CAD systems, are appropriate for representing later stage designs, but at the conceptual stage, the goal is to encourage agile, unimpeded concept generation (Kavakli et al. 1998). Work in this area focuses both on software to recognize sketches symbolically (Do et al. 2000; Kurtoglu and Stahovich 2002), as well as computer input devices to allow sketches to be created more easily (Dickinson et al. 2003).

Future work should focus on integrating aspects of these research areas to produce cohesive tools to support design teams and the design process itself.

Acknowledgments The author gratefully acknowledges the thoughtful support and guidance of Prof. Erik Antonsson, Prof. Joel Burdick, and Dr. Curtis Collins, and the design efforts of the students that are the foundation of this research. The work described in this paper was supported in part by the National Science Foundation under Award DMI-0547629. The opinions, findings, conclusions and rec ommendations expressed are those of the author and do not necessarily reflect the views of the sponsors. The author also acknowledges the generous sponsors of the advanced design course: Applied Materials, Amerigon, Dr. David and Mrs. Barbara Groce, Honeywell, idealab!, Mabuchi Motor, Northrop Grumman, The San Diego Foundation, and Toro.

References

Alvarado CJ, Davis R (2001) Preserving the freedom of paper in a

computer-based sketch tool. HCI International 2001, New Orleans, Louisiana. Lawrence Erlbaum Associates, Inc, USA Baya V and Leifer LJ (1994). A study of the information handling behavior of designers during conceptual design. Sixth Interna tional conference on design theory and methodology. American Society of Mechanical Engineers, Minneapolis Bilda Z, Gero JS, Purcell T (2006) To sketch or not to sketch? That is

the question. Des Stud 27(5):587–613

Chusilp P, Jin Y (2006) Impact of mental iteration on concept

generation. ASME J Mech Des 128(1):14–25

Dickinson JK, Pardasani A, Yu Z, Zeng Y, Antunes H, Li Z (2003)

Augmenting mechanical CAD with pen and tablet. ASME design engineering technical conferences. ASME, Chicago Diehl M, Stroebe W (1991) Productivity loss in idea-generating groups: tracking down the blocking effect. J Pers Soc Psychol 61(3):392–403

Do EY-L, Gross MD, Neiman B, Zimring C (2000) Intentions in and relations among design drawings. Des Stud 21(5):483–503 Dong A, Agogino AM (1997) Text analysis for constructing design

representations. Artif Intell Eng 11(2):65–75

Ferguson ES (1992) Engineering and the mind’s eye. The MIT Press,

Cambridge

Goel V (1995) Sketches of thought. MIT Press, Cambridge Goldschmidt G (1994) On visual design thinking: the vis kids of

architecture. Des Stud 15(2):158–174

Guilford JP (1959) Personality. McGraw-Hill, New York Hong J, Toye G and Leifer L (1994). Using the WWW for a Team

Based Engineering Design Class. Second WWW Conference, Chicago, IL

Kavakli M, Scrivener SAR, Ball LJ (1998) Structure in idea sketching

behaviour. Des Stud 19(4):485–517

Kelley T, Littman J (2001) The Art of innovation: lessons in creativity from IDEO, America’s leading design firm. Double day, New York

Kurtoglu T and Stahovich TF (2002). Interpreting schematic sketches using physical reasoning. AAAI spring symposium 2002, sketch understanding

Lakin F, Wambaugh J, Leifer L, Cannon D, Sivard C (1989) The electronic design notebook: performing medium and processing medium. Vis Comput 5:214–226

McGown A, Green G, Rodgers PA (1998) Visible ideas: information

patterns of conceptual sketch activity. Des Stud 19(4):431–453

McKim RH (1980) Experiences in visual thinking. PWS Publishers,

Boston

Nagai Y, Noguchi H (2003) An experimental study on the design thinking process started from difficult keywords: modeling the thinking process of creative design. J Eng Des 14(4):429–437 Osborn AF (1963) Applied imagination. Charles Scribner and Sons,

New York

Paulus P (2000) Groups, teams, and creativity: the creative potential

of idea generating groups. Appl Psychol 49(2):237–262

Paulus PB, Larey TS, Ortega AH (1995) Performance and perceptions of brainstormers in an organizational setting. Basic Appl Soc Psych 17(1&2):249–265

Purcell AT, Gero JS (1998) Drawings and the design process. Des

Stud 19(4):389–430

Rodgers PA, Green G, McGown A (2000) Using concept sketches to

track design progress. Des Stud 21(5):451–464

Romer A, Leinert S, Sachse P (2000) External support of problem analysis in design problem solving. Res Eng Design 12(3):144– 151

Schön DA, Wiggins G (1992) Kinds of seeing and their functions in

designing. Des Stud 13(2):135–156

Schrage M, Peters T (1999) Serious play: how the World’s best companies simulate to innovate. Harvard Business School Press, Boston

Schütze M, Sachse P, Römer A (2003) Support value of sketching in

the design process. Res Eng Design 14(2):89–97

Shah JJ, Kulkarni SV, Vargas-Hernandez N (2000) Evaluation of idea generation methods for conceptual design: effectiveness metrics and design of experiments. J Mech Des 122(4):377–384 Shah J, Vargas-Hernandez N, Summers J, Kulkarni S (2001)

Collaborative sketching (C-Sketch): an idea generation tech nique for engineering design. J Creat Behav 35(3):168–198 Shah JJ, Vargas-Hernandez N, Smith SM (2003) Metrics for

measuring ideation effectiveness. Des Stud 24(2):111–134 Song S, Agogino AM (2004) Insights on designers’ sketching activities in product design teams. 2004 ASME design engi neering technical conference. ASME, Salt Lake City Sutton RI, Hargadon A (1996) Brainstorming groups in context:

effectiveness in a product design firm. Adm Sci Q 41(4):685–718 Suwa M, Tversky B (1997) What do architects and students perceive in their design sketches? a protocol analysis. Des Stud 18(4):385–403

Tovey M, Porter S, Newman R (2003) Sketching, concept develop

ment and automotive design. Des Stud 24(2):135–153

Ullman DG, Wood S, Craig D (1990) The importance of drawing in

the mechanical design process. Comput Graph 14(2):263–274

van der Lugt R (2005) How sketching can affect the idea generation process in design group meetings. Des Stud 26(2):101–122 Verstijnen IM, van Leeuwen C, Goldschmidt G, Hamel R, Hennessey

JM (1998) Sketching and creative discovery. Des Stud 19(4):519–546

Viste MJ, Cannon DM (1995) Firmware design capture. ASME design theory and methodology conference. American Society of Mechanical Engineers, Boston

Ward A, Liker JK, Sobek D, Cristiano J (1995) The second toyota paradox: how delaying decisions can make better cars faster. Sloan Manag Rev 36(3):43–61

Winner RI, Pennell JP, Bertrand HE, och Slusarczuk MMG (1988)

The role of concurrent engineering in weapon systems acquisi tion. Institute for Defense Analysis (IDA), Boston Wood WH, Yang MC, Cutkosky MR, Agogino A (1998) Design information retrieval: improving access to the informal side of design. 1998 design engineering technical conferences 10th international conference on design theory and methodology. ASME, Atlanta

Yang MC (2003) Concept generation and sketching: correlations with design outcome. 2003 ASME design engineering technical conferences. ASME, Chicago

Yang MC, Cham JG (2007) An analysis of sketching skill and its role

in early stage engineering design. J Mech Des 129(5):476–482

Yen SJ, Fruchter R, Leifer L (1999) Facilitating tacit knowledge capture and reuse in conceptual design activities. ASME design engineering techinical conferences, 11th international confer ence on design theory and methodology. ASME Press, Las Vegas

Zwicky F (1969) Discovery, invention, research through the mor

phological approach. MacMillan, New York

Chapter 6

The process of interaction design

6.1 Introduction 6.2 What is interaction design about?

6.2.1 Four basic activities of interaction design 6.2.2 Three key characteristics of the interaction design process

6.3 Some practical issues

6.3.1 Who are the users? 6.3.2 What do we mean by “needs”? 6.3.3 How do you generate alternative designs? 6.3.4 How do you choose among alternative designs?

6.4 Lifecycle models: showing how the activities are related

6.4.1 A simple lifecycle model for interaction design 6.4.2 Lifecycle models in software engineering 6.4.3 Lifecycle models in HCI

6.1. Introduction

Design is a practical and creative activity, the ultimate intent of which is to develop a product that helps its users achieve their goals. In previous chapters, we looked at different kinds of interactive products, issues you need to take into account when doing interaction design and some of the theoretical basis for the field. This chapter is the first of four that will explore how we can design and build interactive products.

Chapter 1 defined interaction design as being concerned with “designing inter active products to support people in their everyday and working lives.” But how do you go about doing this?

Developing a product must begin with gaining some understanding of what is required of it, but where do these requirements come from? Whom do you ask about them? Underlying good interaction design is the philosophy of user-centered design, i.e., involving users throughout development, but who are the users? Will they know what they want or need even if we can find them to ask? For an innova tive product, users are unlikely to be able to envision what is possible, so where do these ideas come from?

In this chapter, we raise and answer these kinds of questions and discuss the four basic activities and key characteristics of the interaction design process that

165

were introduced in Chapter 1. We also introduce a lifecycle model of interaction design that captures these activities and characteristics.

The main aims of this chapter are to: • Consider what ‘doing’ interaction design involves. • Ask and provide answers for some important questions about the interaction

design process.

6.2 What is interaction design about?

There are many fields of design, for example graphic design, architectural design, industrial and software design. Each discipline has its own interpretation of “de signing.” We are not going to debate these different interpretations here, as we are focussing on interaction design, but a general definition of “design” is informative in beginning to understand what it’s about. The definition of design from the Ox ford English Dictionary captures the essence of design very well: “(design is) a plan or scheme conceived in the mind and intended for subsequent execution.” The act of designing therefore involves the development of such a plan or scheme. For the plan or scheme to have a hope of ultimate execution, it has to be informed with knowledge about its use and the target domain, together with practical constraints such as materials, cost, and feasibility. For example, if we conceived of a plan for building multi-level roads in order to overcome traffic congestion, before the plan could be executed we would have to consider drivers’ attitudes to using such a con struction, the viability of the structure, engineering constraints affecting its feasibil ity, and cost concerns.

In interaction design, we investigate the artifact’s use and target domain by taking a user-centered approach to development. This means that users’ concerns direct the development rather than technical concerns.

Design is also about trade-offs, about balancing conflicting requirements. If we take the roads plan again, there may be very strong environmental arguments for stacking roads higher (less countryside would be destroyed), but these must be bal anced against engineering and financial limitations that make the proposition less attractive. Getting the balance right requires experience, but it also requires the de velopment and evaluation of alternative solutions. Generating alternatives is a key principle in most design disciplines, and one that should be encouraged in interac tion design. As Marc Rettig suggested: “To get a good idea, get lots of ideas” (Ret tig, 1994). However, this is not necessarily easy, and unlike many design disciplines, interaction designers are not generally trained to generate alternative designs. However, the ability to brainstorm and contribute alternative ideas can be learned, and techniques from other design disciplines can be successfully used in interaction design. For example, Danis and Boies (2000) found that using techniques from graphic design that encouraged the generation of alternative designs stimulated in novative interactive systems design. See also the interview with Gillian Crampton Smith at the end of this chapter for her views on how other aspects of traditional design can help produce good interaction design.

Although possible, it is unlikely that just one person will be involved in devel oping and using a system and therefore the plan must be communicated. This re quires it to be captured and expressed in some suitable form that allows review, revision, and improvement. There are many ways of doing this, one of the simplest being to produce a series of sketches. Other common approaches are to write a de scription in natural language, to draw a series of diagrams, and to build prototypes. A combination of these techniques is likely to be the most effective. When users are involved, capturing and expressing a design in a suitable format is especially important since they are unlikely to understand jargon or specialist notations. In fact, a form that users can interact with is most effective, and building prototypes of one form or another (see Chapter 8) is an extremely powerful approach.

So interaction design involves developing a plan which is informed by the product’s intended use, target domain, and relevant practical considerations. Alter native designs need to be generated, captured, and evaluated by users. For the evaluation to be successful, the design must be expressed in a form suitable for users to interact with.

ACTIVITY 6.1

Comment

Imagine that you want to design an electronic calendar or diary for yourself. You might use this system to plan your time, record meetings and appointments, mark down people’s birth days, and so on, basically the kinds of things you might do with a paper-based calendar. Draw a sketch of the system outlining its functionality and its general look and feel. Spend about five minutes on this.

Having produced an outline, now spend five minutes reflecting on how you went about tackling this activity. What did you do first? Did you have any particular artifacts or experi ence to base your design upon? What process did you go through?

The sketch I produced is shown in Figure 6.1. As you can see, I was quite heavily influenced by the paper-based books I currently use! I had in mind that this calendar should allow me to record meetings and appointments, so I need a section representing the days and months. But I also need a section to take notes. I am a prolific note-taker, and so for me this was a key requirement. Then I began to wonder about how I could best use hyperlinks. I certainly want to keep addresses and telephone numbers in my calendar, so maybe there could be a link between, say, someone’s name in the calendar and their entry in my address book that will give me their contact details when I need them? But I still want the ability to be able to turn page by page, for when I’m scanning or thinking about how to organize my time. A search facility would be useful too.

The first thing that came into my head when I started doing this was my own paper-based book where I keep appointments, maps, telephone numbers, and other small notes. I also thought about my notebook and how convenient it would be to have the two combined. Then I sat and sketched different ideas about how it might look (although I’m not very good at sketching). The sketch in Figure 6.1 is the version I’m happiest with. Note that my sketch

MONTH

DAY 9:30 Meeting John

To do: contact Daniel

link to notes section

turn to next page

Figure 6.1

links always available

has a strong resemblance to a paper-based book, yet I’ve also tried to incorporate electronic capabilities. Maybe once I have evaluated this design and ensured that the tasks I want to perform are supported, then I will be more receptive to changing the look away from a paper-based “look and feel.”

The exact steps taken to produce a product will vary from designer to designer, from product to product, and from organization to organization. In this activity, you may have started by thinking about what you’d like such a system to do for you, or you may have been thinking about an existing paper calendar. You may have mixed together features of differ ent systems or other record-keeping support. Having got or arrived at an idea of what you wanted, maybe you then imagined what it might look like, either through sketching with paper and pencil or in your mind.

6.2.1 Four basic activities of interaction design

Four basic activities for interaction design were introduced in Chapter 1, some of which you will have engaged in when doing Activity 6.1. These are: identifying needs and establishing requirements, developing alternative designs that meet those requirements, building interactive versions so that they can be communicated and assessed, and evaluating them, i.e., measuring their acceptability. They are fairly generic activities and can be found in other designs disciplines too. For exam ple, in architectural design (RIBA, 1988) basic requirements are established in a work stage called “inception”, alternative design options are considered in a “feasi bility” stage and “the brief” is developed through outline proposals and scheme de sign. During this time, prototypes may be built or perspectives may be drawn to give clients a better indication of the design being developed. Detail design speci fies all components, and working drawings are produced. Finally, the job arrives on site and building commences.

We will be expanding on each of the basic activities of interaction design in the next two chapters. Here we give only a brief introduction to each.

Identifying needs and establishing requirements

In order to design something to support people, we must know who our target users are and what kind of support an interactive product could usefully provide. These needs form the basis of the product’s requirements and underpin subsequent design and development. This activity is fundamental to a user-centered approach, and is very important in interaction design; it is discussed further in Chapter 7.

Developing alternative designs

This is the core activity of designing: actually suggesting ideas for meeting the re quirements. This activity can be broken up into two sub-activities: conceptual design and physical design. Conceptual design involves producing the conceptual model for the product, and a conceptual model describes what the product should do, behave and look like. Physical design considers the detail of the product including the col ors, sounds, and images to use, menu design, and icon design. Alternatives are con sidered at every point. You met some of the ideas for conceptual design in Chapter 2; we go into more detail about conceptual and physical design in Chapter 8.

Building interactive versions of the designs

Interaction design involves designing interactive products. The most sensible way for users to evaluate such designs, then, is to interact with them. This requires an interactive version of the designs to be built, but that does not mean that a software version is required. There are different techniques for achieving “interaction,” not all of which require a working piece of software. For example, paper-based proto types are very quick and cheap to build and are very effective for identifying prob lems in the early stages of design, and through role-playing users can get a real sense of what it will be like to interact with the product. This aspect is also covered in Chapter 8.

Evaluating designs

Evaluation is the process of determining the usability and acceptability of the prod uct or design that is measured in terms of a variety of criteria including the number of errors users make using it, how appealing it is, how well it matches the requirements, and so on. Interaction design requires a high level of user involvement throughout development, and this enhances the chances of an acceptable product being deliv ered. In most design situations you will find a number of activities concerned with quality assurance and testing to make sure that the final product is “fit-for-purpose.” Evaluation does not replace these activities, but complements and enhances them. We devote Chapters 10 through 14 to the important subject of evaluation.

The activities of developing alternative designs, building interactive versions of the design, and evaluation are intertwined: alternatives are evaluated through the interactive versions of the designs and the results are fed back into further design. This iteration is one of the key characteristics of the interaction design process, which we introduced in Chapter 1.

6.2.2 Three key characteristics of the interaction design process

There are three characteristics that we believe should form a key part of the interac tion design process. These are: a user focus, specific usability criteria, and iteration.

The need to focus on users has been emphasized throughout this book, so you will not be surprised to see that it forms a central plank of our view on the interac tion design process. While a process cannot, in itself, guarantee that a development will involve users, it can encourage focus on such issues and provide opportunities for evaluation and user feedback.

Specific usability and user experience goals should be identified, clearly docu mented, and agreed upon at the beginning of the project. They help designers to choose between different alternative designs and to check on progress as the prod uct is developed.

Iteration allows designs to be refined based on feedback. As users and design ers engage with the domain and start to discuss requirements, needs, hopes and as pirations, then different insights into what is needed, what will help, and what is feasible will emerge. This leads to a need for iteration, for the activities to inform each other and to be repeated. However good the designers are and however clear the users may think their vision is of the required artifact, it will be necessary to re vise ideas in light of feedback, several times. This is particularly true if you are try ing to innovate. Innovation rarely emerges whole and ready to go. It takes time, evolution, trial and error, and a great deal of patience. Iteration is inevitable be cause designers never get the solution right the first time (Gould and Lewis, 1985).

We shall return to these issues and expand upon them in Chapter 9.

6.3 Some practical issues

Before we consider how the activities and key characteristics of interaction design can be pulled together into a coherent process, we want to consider some questions highlighted by the discussion so far. These questions must be answered if we are going to be able to “do” interaction design in practice. These are:

6.3.1 Who are the users?

In Chapter 1, we said that an overarching objective of interaction design is to opti mize the interactions people have with computer-based products, and that this re quires us to support needs, match wants, and extend capabilities. We also stated above that the activity of identifying these needs and establishing requirements was fundamental to interaction design. However, we can’t hope to get very far with this intent until we know who the users are and what they want to achieve. As a starting point, therefore, we need to know who we consult to find out the users’ require ments and needs.

Identifying the users may seem like a straightforward question, but in fact there are many interpretations of “user.” The most obvious definition is those people who interact directly with the product to achieve a task. Most people would agree with this definition; however, there are others who can also be thought of as users. For example, Holtzblatt and Jones (1993) include in their definition of “users” those who manage direct users, those who receive products from the system, those who test the system, those who make the purchasing de cision, and those who use competitive products. Eason (1987) identifies three categories of user: primary, secondary and tertiary. Primary users are those likely to be frequent hands-on users of the system; secondary users are occa sional users or those who use the system through an intermediary; and tertiary users are those affected by the introduction of the system or who will influence its purchase.

The trouble is that there is a surprisingly wide collection of people who all have a stake in the development of a successful product. These people are called stakeholders. Stakeholders are “people or organizations who will be affected by the system and who have a direct or indirect influence on the system require ments” (Kotonya and Sommerville, 1998). Dix et al. (1993) make an observation that is very pertinent to a user-centered view of development, that “It will fre quently be the case that the formal ‘client’ who orders the system falls very low on the list of those affected. Be very wary of changes which take power, influ ence or control from some stakeholders without returning something tangible in its place.”

Generally speaking, the group of stakeholders for a particular product is going to be larger than the group of people you’d normally think of as users, al though it will of course include users. Based on the definition above, we can see that the group of stakeholders includes the development team itself as well as its managers, the direct users and their managers, recipients of the product’s out put, people who may lose their jobs because of the introduction of the new prod uct, and so on.

For example, consider again the calendar system in Activity 6.1. According to the description we gave you, the user group for the system has just one member: you. However, the stakeholders for the system would also include people you make appointments with, people whose birthdays you remember, and even com panies that produce paper-based calendars, since the introduction of an elec tronic calendar may increase competition and force them to operate differently.

This last point may seem a little exaggerated for just one system, but if you think of others also migrating to an electronic version, and abandoning their paper cal endars, then you can see how the companies may be affected by the introduction of the system.

The net of stakeholders is really quite wide! We do not suggest that you need to involve all of the stakeholders in your user-centered approach, but it is impor tant to be aware of the wider impact of any product you are developing. Identifying the stakeholders for your project means that you can make an informed decision about who should be involved and to what degree.

ACTIVITY 6.2

Comment

Who do you think are the stakeholders for the check-out system of a large supermarket?

First, there are the check-out operators. These are the people who sit in front of the machine and pass the customers’ purchases over the bar code reader, receive payment, hand over re ceipts, etc. Their stake in the success and usability of the system is fairly clear and direct. Then you have the customers, who want the system to work properly so that they are charged the right amount for the goods, receive the correct receipt, are served quickly and efficiently. Also, the customers want the check-out operators to be satisfied and happy in their work so that they don’t have to deal with a grumpy assistant. Outside of this group, you then have supermarket managers and supermarket owners, who also want the assistants to be happy and efficient and the customers to be satisfied and not complaining. They also don’t want to lose money because the system can’t handle the payments correctly. Other people who will be affected by the success of the system include other supermarket employ ees such as warehouse staff, supermarket suppliers, supermarket owners’ families, and local shop owners whose business would be affected by the success or failure of the system. We wouldn’t suggest that you should ask the local shop owner about requirements for the super market check-out system. However, you might want to talk to warehouse staff, especially if the system links in with stock control or other functions.

6.3.2 What do we mean by “needs”?

If you had asked someone in the street in the late 1990s what she ‘needed’, I doubt that the answer would have included interactive television, or a jacket which was wired for communication, or a smart fridge. If you presented the same person with these possibilities and asked whether she would buy them if they were available, then the answer would have been different. When we talk about identifying needs, therefore, it’s not simply a question of asking people, “What do you need?” and then supplying it, because people don’t necessarily know what is possible (see Suzanne Robertson’s interview at the end of Chapter 7 for “un-dreamed-of” re quirements). Instead, we have to approach it by understanding the characteristics and capabilities of the users, what they are trying to achieve, how they achieve it currently, and whether they would achieve their goals more effectively if they were supported differently.

There are many dimensions along which a user’s capabilities and characteris tics may vary, and that will have an impact on the product’s design. You have met some of these in Chapter 3. For example, a person’s physical characteristics may af fect the design: size of hands may affect the size and positioning of input buttons, and motor abilities may affect the suitability of certain input and output devices; height is relevant in designing a physical kiosk, for example; and strength in design ing a child’s toy—a toy should not require too much strength to operate, but may require strength greater than expected for the target age group to change batteries or perform other operations suitable only for an adult. Cultural diversity and expe rience may affect the terminology the intended user group is used to, or how ner vous about technology a set of users may be.

If a product is a new invention, then it can be difficult to identify the users and representative tasks for them; e.g., before microwave ovens were invented, there were no users to consult about requirements and there were no representative tasks to identify. Those developing the oven had to imagine who might want to use such an oven and what they might want to do with it.

It may be tempting for designers simply to design what they would like, but their ideas would not necessarily coincide with those of the target user group. It is imperative that representative users from the real target group be consulted. For example, a company called Netpliance was developing a new “Internet appli ance,” i.e., a product that would seamlessly integrate all the services necessary for the user to achieve a specific task on the Internet (Isensee et al., 2000). They took a user-centered approach and employed focus group studies and surveys to under stand their customers’ needs. The marketing department led these efforts, but de velopers observed the focus groups to learn more about their intended user group. Isensee et al. (p. 60) observe that “It is always tempting for developers to create products they would want to use or similar to what they have done before. How ever, in the Internet appliance space, it was essential to develop for a new audi ence that desires a simpler product than the computer industry has previously provided.”

In these circumstances, a good indication of future behavior is current or past behavior. So it is always useful to start by understanding similar behavior that is already established. Apart from anything else, introducing something new into people’s lives, especially a new “everyday” item such as a microwave oven, requires a culture change in the target user population, and it takes a long time to effect a culture change. For example, before cell phones were so widely avail able there were no users and no representative tasks available for study, per se. But there were standard telephones and so understanding the tasks people per form with, and in connection with, standard telephones was a useful place to start. Apart from making a telephone call, users also look up people’s numbers, take messages for others not currently available, and find out the number of the last person to ring them. These kinds of behavior have been translated into memories for the telephone, answering machines, and messaging services for mobiles. In order to maximize the benefit of e-commerce sites, traders have found that referring back to customers’ non-electronic habits and behaviors can be a good basis for enhancing e-commerce activity (CHI panel, 2000; Lee et al., 2000).

6.3.3 How do you generate alternative designs?

A common human tendency is to stick with something that we know works. We probably recognize that a better solution may exist out there somewhere, but it’s very easy to accept this one because we know it works—it’s “good enough.” Set tling for a solution that is good enough is not, in itself, necessarily “bad,” but it may be undesirable because good alternatives may never be considered, and considering alternative solutions is a crucial step in the process of design. But where do these alternative ideas come from?

One answer to this question is that they come from the individual designer’s flair and creativity. While it is certainly true that some people are able to produce wonderfully inspired designs while others struggle to come up with any ideas at all, very little in this world is completely new. Normally, innovations arise through cross-fertilization of ideas from different applications, the evolution of an existing product through use and observation, or straightforward copying of other, similar products. For example, if you think of something commonly believed to be an “in vention,” such as the steam engine, this was in fact inspired by the observation that the steam from a kettle boiling on the stove lifted the lid. Clearly there was an amount of creativity and engineering involved in making the jump from a boiling kettle to a steam engine, but the kettle provided the inspiration to translate experi ence gained in one context into a set of principles that could be applied in another. As an example of evolution, consider the word processor. The capabilities of suites of office software have gradually increased from the time they first appeared. Ini tially, a word processor was just an electronic version of a typewriter, but gradually other capabilities, including the spell-checker, thesaurus, style sheets, graphical ca pabilities, etc., were added.

So although creativity and invention are often wrapped in mystique, we do un derstand something of the process and of how creativity can be enhanced or in spired. We know, for instance, that browsing a collection of designs will inspire designers to consider alternative perspectives, and hence alternative solutions. The field of case-based reasoning (Maher and Pu, 1997) emerged from the observation that designers solve new problems by drawing on knowledge gained from solving previous similar problems. As Schank (1982; p. 22) puts it, “An expert is someone who gets reminded of just the right prior experience to help him in processing his current experiences.” And while those experiences may be the designer’s own, they can equally well be others’.

A more pragmatic answer to this question, then, is that alternatives come from looking at other, similar designs, and the process of inspiration and creativity can be enhanced by prompting a designer’s own experience and by looking at others’ ideas and solutions. Deliberately seeking out suitable sources of inspiration is a valuable step in any design process. These sources may be very close to the in tended new product, such as competitors’ products, or they may be earlier versions of similar systems, or something completely different.

ACTIVITY 6.3

Comment

Consider again the calendar system introduced at the beginning of the chapter. Reflecting on the process again, what do you think inspired your outline design? See if you can identify any elements within it that you believe are truly innovative.

For my design, I haven’t seen an electronic calendar, although I have seen plenty of other software-based systems. My main sources of inspiration were my current paper-based books.

Some of the things you might have been thinking of include your existing paper-based calendar, and other pieces of software you commonly use and find helpful or easy to use in some way. Maybe you already have access to an electronic calendar, which will have given you some ideas, too. However, there are probably other aspects that make the design some how unique to you and may be innovative to a greater or lesser degree.

All this having been said, under some circumstances the scope to consider alterna tive designs may be limited. Design is a process of balancing constraints and con stantly trading off one set of requirements with another, and the constraints may be such that there are very few viable alternatives available. As another example, if you are designing a software system to run under the Windows operating system, then elements of the design will be prescribed because you must conform to the Windows “look and feel,” and to other constraints intended to make Windows pro grams consistent for the user. We shall return to style guides and standards in Chapter 8.

If you are producing an upgrade to an existing system, then you may face other constraints, such as wanting to keep the familiar elements of it and retain the same “look and feel.” However, this is not necessarily a rigid rule. Kent Sullivan reports that when designing the Windows 95 operating system to replace the Windows 3.1 and Windows for Workgroups 3.11 operating systems, they initially focused too much on consistency with the earlier versions (Sullivan, 1996).

BOX 6.1

A Box Full of Ideas

The innovative product design company IDEO was introduced in Chapter 1. It has been in volved in the development of many artifacts in cluding the first commercial computer mouse and the PalmPilot V. Underlying some of their creative flair is a collection of weird and wonder ful engineering housed in a large flatbed filing cabinet called the TechBox (see Figure 6.2). The TechBox holds around 200 gizmos and interesting materials, divided into categories: “Amazing Materials,” “Cool Mechanisms,” “In teresting Manufacturing Processes,” “Electronic Technologies,” and “Thermal and Optical.” Each item has been placed in the box because it a neat idea or a new process. Staff at represents IDEO take along a selection of items from the TechBox to brainstorming meetings. The items may be chosen because they provide useful vi

sual props or possible solutions to a particular issue, or simply to provide some light relief. Each item is clearly labeled with its name and category, but further information can be found by accessing the TechBox’s online catalog. Each item has its own page detailing what the item is, why it’s interesting, where it came from, and who has used it or knows more about it. For example, the page in Figure 6.3 relates to a metal injection-molding technique. Other items in the box include an example of metal-coated wood, materials with and without holes that stretch, bend, and change shape or color at different temperatures. Each TechBox has its own curator who is re sponsible for maintaining and cataloging the items and for promoting its use within the office. Any one can submit a new item for consideration and

Figure 6.2

Figure 6.3

How are ample from fice (see Figure 6.4):

as items become common place, they are removed from the TechBox to make way for the next gener ation of fascinating contraptions. these things used? Well here is one ex

Patrick Hall at the London IDEO of

IDEO was asked to review the design of a mass-produced hand-held medical product that was deemed to be too big.

(a)

(c)

(b)

Figure 6.4

DILEMMA

draw on their experience of design

Designers when approaching a new project. This includes the use of previous designs that they know work, both designs they have created themselves and those that others have created. Others’ creations often spark inspiration that also leads to new ideas and innovation. This is well known and understood. However, the expression of an idea is protected by copyright, and people who infringe that copyright can be taken to court and prosecuted. Note that copyright covers the expression of an idea and not the idea itself. This means, for example, that while there are numerous word processors all with simi lar functionality, this does not represent an in fringement of copyright as the idea has been expressed in different ways, and it’s the expression that’s been copyrighted. Copyright is free and is automatically invested in the author of something, e.g., the writer of a book or a programmer who de velops a program, unless he signs the copyright over to someone else. Authors writing for acade mic journals often are asked to sign over their copyright to the publisher of the journal. Various limitations and special conditions can apply, but basically, the copyright is no longer theirs. People who produce something through their employ ment, such as programs or products, may have in their employment contract a statement saying that

the copyright relating to anything produced in the course of that employment is automatically as signed to the employer and does not remain with the employee. On the other hand, patenting is an alternative to copyright that does protect the idea rather than the expression. There are various forms of patent ing, each of which is designed to allow the inventor the chance to capitalize on an idea. It is unusual for software to be patented, since it is a long, slow, and expensive process, although there is a recent trend towards patenting business processes. For example, Amazon, the on-line bookstore, has patented its “one-click” purchasing process, which allows regular users simply to choose a book and buy it with one mouse click (US Patent No. 5960411, September 29, 1999). This is possible be cause the system stores its customers’ details and “recognizes” them when they access the site again. So the dilemma comes in knowing when it’s OK to use someone else’s work as a source of in spiration and when you are infringing copyright or patent law. The issues around this question are complex and detailed, and well beyond the scope of this book, but more information and examples of law cases that have been brought successfully and unsuccessfully can be found in Bainbridge (1999).

6.3.4 How do you choose among alternative designs?

Choosing among alternatives is about making design decisions: Will the device use keyboard entry or a touch screen? Will the device provide an automatic memory function or not? These decisions will be informed by the information gathered about users and their tasks, and by the technical feasibility of an idea. Broadly speaking, though, the decisions fall into two categories: those that are about exter nally visible and measurable features, and those that are about characteristics in ternal to the system that cannot be observed or measured without dissecting it. For example, externally visible and measurable factors for a building design in clude the ease of access to the building, the amount of natural light in rooms, the width of corridors, and the number of power outlets. In a photocopier, externally visible and measurable factors include the physical size of the machine, the speed and quality of copying, the different sizes of paper it can use, and so on. Underly ing each of these factors are other considerations that cannot be observed or stud ied without dissecting the building or the machine. For example, the number of power outlets will be dependent on how the wiring within the building is designed and the capacity of the main power supply; the choice of materials used in a pho tocopier may depend on its friction rating and how much it deforms under certain conditions.

In an interactive product there are similar factors that are externally visible and measurable and those that are hidden from the users’ view. For example, ex actly why the response time for a query to a database (or a web page) is, say, 4 sec onds will almost certainly depend on technical decisions made when the database was constructed, but from the users’ viewpoint the important observation is the fact that it does take 4 seconds to respond.

In interaction design, the way in which the users interact with the product is considered the driving force behind the design and so we concentrate on the exter nally visible and measurable behavior. Detailed internal workings are important only to the extent that they affect the external behavior. This does not mean that design decisions concerning a system’s internal behavior are any less important: however, the tasks that the user will perform should influence design decisions no less than technical issues.

So, one answer to the question posed above is that we choose between alterna tive designs by letting users and stakeholders interact with them and by discussing their experiences, preferences and suggestions for improvement. This is fundamen tal to a user-centered approach to development. This in turn means that the de signs must be available in a form that can be reasonably evaluated with users, not in technical jargon or notation that seems impenetrable to them.

One form traditionally used for communicating a design is documentation, e.g., a description of how something will work or a diagram showing its components. The trouble is that a static description cannot capture the dynamics of behavior, and for an interaction device we need to communicate to the users what it will be like to actually operate it.

In many design disciplines, prototyping is used to overcome potential client misunderstandings and to test the technical feasibility of a suggested design and its production. Prototyping involves producing a limited version of the product with the purpose of answering specific questions about the design’s feasibility or appro priateness. Prototypes give a better impression of the user experience than simple descriptions can ever do, and there are different kinds of prototyping that are suit able for different stages of development and for eliciting different kinds of infor mation. One experience illustrating the benefits of prototyping is described in Box 6.2. So one important aspect of choosing among alternatives is that prototypes should be built and evaluated by users. We’ll revisit the issue of prototyping in Chapter 8.

Another basis on which to choose between alternatives is “quality,” but this requires a clear understanding of what “quality” means. People’s views of what is a quality product vary, and we don’t always write it down. Whenever we use any thing we have some notion of the level of quality we are expecting, wanting, or needing. Whether this level of quality is expressed formally or informally does not matter. The point is that it exists and we use it consciously or subconsciously to evaluate alternative items. For example, if you have to wait too long to download

BOX 6.2

The Value of Prototyping

I learned the value of a prototype through a very effective role-playing exercise. I was on a course designed to introduce new graduates to different possible careers in industry. One of the themes was production and manufacturing and the aim of exercise was to produce a notebook. one group Each group was told that it had 30 minutes to de liver 10 books to the person in charge. Groups were given various pieces of paper, scissors, sticky tape, staples, etc., and told to organize ourselves as best we could. So my group set to work organizing ourselves into a production line, with one of us cutting up the paper, another stapling the pages to gether, another sealing the binding with the sticky tape, and so on. One person was even in charge of quality assurance. It took us less than 10 minutes to produce the 10 books, and we rushed off with our delivery. When we showed the person in

charge, he replied, “That’s not what I wanted, I need it bigger than that.” Of course, the size of the notebook wasn’t specified in the description of the task, so we found out how big he wanted it, got some more materials, and scooted back to produce 10 more books. Again, we set up our production line and produced 10 books to the correct size. On delivery we were again told that it was not what was required: he wanted the binding to work the other way around. This time we got as many of the requirements as we could and went back, devel oped one book, and took that back for further feedback and refinement before producing the 10 required. If we had used prototyping as a way of explor ing our ideas and checking requirements in the first place, we could have saved so much effort and resource!

a web page, then you are likely to give up and try a different site—you are apply ing a certain measure of quality associated with the time taken to download the web page. If one cell phone makes it easy to perform a critical function while an other involves several complicated key sequences, then you are likely to buy the former rather than the latter. You are applying a quality criterion concerned with efficiency.

Now, if you are the only user of a product, then you don’t necessarily have to express your definition of “quality” since you don’t have to communicate it to anyone else. However, as we have seen, most projects involve many different stakeholder groups, and you will find that each of them has a different definition of quality and different acceptable limits for it. For example, although all stake holders may agree on targets such as “response time will be fast” or “the menu structure will be easy to use,” exactly what each of them means by this is likely to vary. Disputes are inevitable when, later in development, it transpires that “fast” to one set of stakeholders meant “under a second,” while to another it meant “between 2 and 3 seconds.” Capturing these different views in clear un ambiguous language early in development takes you halfway to producing a product that will be regarded as “good” by all your stakeholders. It helps to clar ify expectations, provides a benchmark against which products of the develop ment process can be measured, and gives you a basis on which to choose among alternatives.

The process of writing down formal, verifiable–and hence measurable–usability criteria is a key characteristic of an approach to interaction design called usability en gineering that has emerged over many years and with various proponents (Whiteside

ACTIVITY 6.4

et al., 1988; Nielsen, 1993). Usability engineering involves specifying quantifiable measures of product performance, documenting them in a usability specification, and assessing the product against them. One way in which this approach is used is to make changes to subsequent versions of a system based on feedback from carefully documented results of usability tests for the earlier version. We shall return to this idea later when we discuss evaluation.

Consider the calendar system that you designed in Activity 6.1. Suggest some usability crite ria that you could use to determine the calendar’s quality. You will find it helpful to think in terms of the usability goals introduced in Chapter 1: effectiveness, efficiency, safety, utility, learnability, and memorability. Be as specific as possible. Check your criteria by considering exactly what you would measure and how you would measure its performance.

Having done that, try to do the same thing for the user experience goals introduced in Chapter 1; these relate to whether a system is satisfying, enjoyable, motivating, rewarding, and so on.

Comment

Finding measurable characteristics for some of these is not easy. Here are some suggestions, but you may have found others. Note that the criteria must be measurable and very specific.

Understanding what activities are involved in interaction design is the first step to being able to do it, but it is also important to consider how the activities are related

to one another so that the full development process can be seen. The term lifecycle model 1 is used to represent a model that captures a set of activities and how they are related. Sophisticated models also incorporate a description of when and how to move from one activity to the next and a description of the deliverables for each activity. The reason such models are popular is that they allow developers, and par ticularly managers, to get an overall view of the development effort so that progress can be tracked, deliverables specified, resources allocated, targets set, and so on.

Existing models have varying levels of sophistication and complexity. For pro jects involving only a few experienced developers, a simple process would probably be adequate. However, for larger systems involving tens or hundreds of developers with hundreds or thousands of users, a simple process just isn’t enough to provide the management structure and discipline necessary to engineer a usable product. So something is needed that will provide more formality and more discipline. Note that this does not mean that innovation is lost or that creativity is stifled. It just means that a structured process is used to provide a more stable framework for creativity.

However simple or complex it appears, any lifecycle model is a simplified version of reality. It is intended as an abstraction and, as with any good ab straction, only the amount of detail required for the task at hand should be in cluded. Any organization wishing to put a lifecycle model into practice will need to add detail specific to its particular circumstances and culture. For ex ample, Microsoft wanted to maintain a small-team culture while also making possible the development of very large pieces of software. To this end, they have evolved a process that has been called “synch and stabilize,” as described in Box 6.3.

In the next subsection, we introduce our view of what a lifecycle model for in teraction design might look like that incorporates the four activities and the three key characteristics of the interaction design process discussed above. This will form the basis of our discussion in Chapters 7 and 8. Depending on the kind of system being developed, it may not be possible or appropriate to follow this model for every element of the system, and it is certainly true that more detail would be re quired to put the lifecycle into practice in a real project.

Many other lifecycle models have been developed in fields related to interac tion design, such as software engineering and HCI, and our model is evolved from these ideas. To put our interaction design model into context we include here a de scription of five lifecycle models, three from software engineering and two from HCI, and consider how they relate to it.

1Sommerville (2001) uses the term process model to mean what we mean by lifecycle model, and refers to the waterfall model as the software lifecycle. Pressman (1992) talks about paradigms. In HCI the term “lifecycle model” is used more widely. For this reason, and because others use “process model” to represent something that is more detailed than a lifecycle model (e.g., Comer, 1997) we have chosen to use lifecycle model.

BOX 6.3

How Microsoft Builds Software (Cusumano and Selby, 1997)

one of the largest software companies in the world and builds some very complex soft ware; for example, Windows 95 contains more than 11 million lines of code and required more than 200 programmers. Over a two-and-a-half year period from the beginning of 1993, two re searchers, Michael Cusumano and Richard Selby, were given access to Microsoft project documents and key personnel for study and interview. Their aim was to build up an understanding of how Microsoft produces software. Rather than adopt the structured software engineering practices oth ers have followed, Microsoft’s strategy has been to cultivate entrepreneurial flexibility throughout its software teams. In essence, it has tried to scale up the culture of a loosely-structured, small soft “The objective is to get many small ware team.

Microsoft is

teams (three vidual programmers to work together as a single

relatively large team in order to build large products relatively quickly while still allowing

Planning Phase

and schedule

program management and development group define feature functionality, architectural issues, and component

interdependencies.

• Schedule and Feature Team Formation Based on

specification

individual programmers and teams freedom to evolve their designs and operate nearly au tonomously” (p. 54). In order to maintain consistency and to ensure that products are eventually shipped, the teams synchronize their activities daily and periodically stabilize the whole product. Cusumano and Selby have therefore labeled Microsoft’s unique process “synch and stabilize.” Figure 6.5 shows an overview of this process, which is divided into three phases: the planning phase, the development phase and the stabilization phase. The planning phase begins with a vision state ment that defines the goals of the new product and the user activities to be supported by the product. (Microsoft uses a method called activity-based planning to identify and prioritize the features to

to eight developers each) or indi

Define product vision, specifications,

Document Based on vision statement,

be built; we

return to this in Chapter 9.) The pro gram managers together with the developers then

write a functional specification in enough detail to describe features and to develop schedules and al

Development Phase Feature development in 3 or 4 sequential subprojects that each results in a milestone release

document, program management coordinates

schedule and arranges feature teams that each contain approximately 1 program manager, 3–8 developers, and 3–8 testers (who work in parallel 1:1 with developers).

Figure 6.5

rors on a daily because many the same code

locate staff. change by about 30% during the course of devel the list is not fixed at this time. In the opment, so next phase, the development phase, the feature list is divided into three or four parts, each with its own small development team, and the schedule is divided into sequential subprojects, each with its own deadline (milestone). The teams work in par allel on a set of features and synchronize their work by putting together their code and finding er

The feature list in this document will

and weekly basis. This is necessary programmers may be working on at once. For example, during the

peak development of Excel 3.0, 34 developers were actively changing the same source code on a daily basis. At the end of a subproject, i.e., on reaching a milestone, all errors are found and fixed, thus sta bilizing the product, before moving on to the next subproject and eventually to the final milestone, which represents the release date. Figure 6.6 shows an overview of the milestone structure for a project with three subprojects. This synch-and

stabilize approach has been used to develop Excel, Office, Publisher, Windows 95, Windows NT, Word, and Works, among others.

6.4.1 A simple lifecycle model for interaction design

We see the activities of interaction design as being related as shown in Figure 6.7. This model incorporates iteration and encourages a user focus. While the outputs from each activity are not specified in the model, you will see in Chapter 7 that our description of establishing requirements includes the need to identify specific us ability criteria.

The model is not intended to be prescriptive; that is, we are not suggesting that this is how all interactive products are or should be developed. It is based on our observations of interaction design and on information we have gleaned in the research for this book. It has its roots in the software engineering and HCI lifecy cle models described below, and it represents what we believe is practiced in the field.

Most projects start with identifying needs and requirements. The project may have arisen because of some evaluation that has been done, but the lifecycle of the new (or modified) product can be thought of as starting at this point. From this ac tivity, some alternative designs are generated in an attempt to meet the needs and requirements that have been identified. Then interactive versions of the designs are developed and evaluated. Based on the feedback from the evaluations, the team may need to return to identifying needs or refining requirements, or it may go straight into redesigning. It may be that more than one alternative design fol lows this iterative cycle in parallel with others, or it may be that one alternative at a time is considered. Implicit in this cycle is that the final product will emerge in an evolutionary fashion from a rough initial idea through to the finished product. Ex actly how this evolution happens may vary from project to project, and we return to this issue in Chapter 8. The only factor limiting the number of times through the cycle is the resources available, but whatever the number is, development ends with an evaluation activity that ensures the final product meets the prescribed us ability criteria.

(Re)Design

Identify needs/ establish requirements

Build an interactive version

Figure 6.7

Evaluate

Final product

6.4.2 Lifecycle models in software engineering

Software engineering has spawned many lifecycle models, including the water fall, the spiral, and rapid applications development (RAD). Before the waterfall was first proposed in 1970, there was no generally agreed approach to software development, but over the years since then, many models have been devised, re flecting in part the wide variety of approaches that can be taken to developing software. We choose to include these specific lifecycle models for two reasons: First, because they are representative of the models used in industry and they have all proved to be successful, and second, because they show how the empha sis in software development has gradually changed to include a more iterative, user-centered view.

The waterfall lifecycle model

The waterfall lifecycle was the first model generally known in software engineer ing and forms the basis of many lifecycles in use today. This is basically a linear model in which each step must be completed before the next step can be started (see Figure 6.8). For example, requirements analysis has to be completed before

Requirements analysis
Design
Code
Test

Figure 6.8

Maintenance

design can begin. The names given to these steps varies, as does the precise defi nition of each one, but basically, the lifecycle starts with some requirements analysis, moves into design, then coding, then implementation, testing, and fi nally maintenance. One of the main flaws with this approach is that require ments change over time, as businesses and the environment in which they operate change rapidly. This means that it does not make sense to freeze re quirements for months, or maybe years, while the design and implementation are completed.

Some feedback to earlier stages was acknowledged as desirable and indeed practical soon after this lifecycle became widely used (Figure 6.8 does show some limited feedback between phases). But the idea of iteration was not embedded in the waterfall’s philosophy. Some level of iteration is now incorporated in most ver sions of the waterfall, and review sessions among developers are commonplace. However, the opportunity to review and evaluate with users was not built into this model.

The spiral lifecycle model

For many years, the waterfall formed the basis of most software developments, but in 1988 Barry Boehm (1988) suggested the spiral model of software development (see Figure 6.9). Two features of the spiral model are immediately clear from Fig ure 6.9: risk analysis and prototyping. The spiral model incorporates them in an it erative framework that allows ideas and progress to be repeatedly checked and evaluated. Each iteration around the spiral may be based on a different lifecycle model and may have different activities.

In the spiral’s case, it was not the need for user involvement that inspired the introduction of iteration but the need to identify and control risks. In Boehm’s ap proach, development plans and specifications that are focused on the risks involved in developing the system drive development rather than the intended functionality, as was the case with the waterfall. Unlike the waterfall, the spiral explicitly encour ages alternatives to be considered, and steps in which problems or potential prob lems are encountered to be re-addressed.

The spiral idea has been used by others for interactive devices (see Box 6.4). A more recent version of the spiral, called the WinWin spiral model (Boehm et al., 1998), explicitly incorporates the identification of key stakeholders and their re spective “win” conditions, i.e., what will be regarded as a satisfactory outcome for each stakeholder group. A period of stakeholder negotiation to ensure a “win-win” result is included.

Rapid Applications Development (RAD)

During the 1990s the drive to focus upon users became stronger and resulted in a number of new approaches to development. The Rapid Applications Development (RAD) approach (Millington and Stapleton, 1995) attempts to take a user-cen tered view and to minimize the risk caused by requirements changing during the

partition

Determine objectives, alternatives, constraints

Review Commitment

Plan next phases

Develop, verify next-level product

Figure 6.9

course of the project. The ideas behind RAD began to emerge in the early 1990s, also in response to the inappropriate nature of the linear lifecycle models based on the waterfall. Two key features of a RAD project are:

• Time-limited cycles of approximately six months, at the end of which a sys tem or partial system must be delivered. This is called time-boxing. In effect, this breaks down a large project into many smaller projects that can deliver products incrementally, and enhances flexibility in terms of the development techniques used and the maintainability of the final system.

A basic RAD lifecycle has five phases (see Figure 6.10): project set-up, JAD workshops, iterative design and build, engineer and test final prototype, implemen tation review. The popularity of RAD has led to the emergence of an industry standard RAD-based method called DSDM (Dynamic Systems Development Method). This was developed by a non-profit-making DSDM consortium made up of a group of companies that recognized the need for some standardization in the field. The first of nine principles stated as underlying DSDM is that “active user in volvement is imperative.” The DSDM lifecycle is more complicated than the one we’ve shown here. It involves five phases: feasibility study, business study, func tional model iteration, design and build iteration, and implementation. This is only a generic process and must be tailored for a particular organization.

ACTIVITY 6.5

Comment

How closely do you think the RAD lifecycle model relates to the interaction design model described in Section 6.4.1?

RAD and DSDM explicitly incorporate user involvement, evaluation and iteration. User in volvement, however, appears to be limited to the JAD workshop, and iteration appears to be limited to the design and build model is present, but the flexibility appears not to be. Our interaction design process would be appropriately used within the design and build stage.

Project initiation

phase. The philosophy underlying the interaction design

JAD workshops

Iterative design and build

Evaluate final system

Implementation review

Figure 6.10

BOX 6.4

A Product Design Process for Internet Appliances

Netpliance, providing Internet appliances, i.e. one-stop prod ucts that allow a user to achieve a specific Internet based task, have adopted a user-centered approach to development based on RAD (Isensee et al., 2000). They attribute their ability to develop sys tems from concept to delivery in seven months to this strong iterative approach: the architecture was revised and iterated over several days; the code was developed with weekly feedback sessions from users; components were typically revised four times, but some went through 12 cycles. Their sim ple spiral model is shown in Figure 6.11. The target audience for this appliance, called the i-opener, were people who did not use or own a PC and who may have been uncomfortable around computers. The designers were therefore looking to design something that would be as far the “traditional” PC model as possible away from in terms of both hardware and software. In design ing the software, they abandoned the desktop metaphor of the Windows operating system and concentrated on an interface that provided good the user’s task. For the hardware de support for

sign, they needed to get away from the image of a large heavy box with lots of wires and plugs, any one of which may be faulty and cause the user problems.

Im plementation Implementatio n

which has moved into the market of

Design

The device provides three functions: sending and receiving email, categorical content, and web accessibility. That is it. There are no additional features, no complicated menus and options. The device is streamlined to perform these tasks and no more. This choice of functions was based on user studies and testing that served to identify the most frequently used functions, i.e., those that most appropriately supported the users. An exam ple screen showing the news channel for i-opener is shown in Figure 6.12. Identifying requirements for a new device is difficult. There is no direct experience of using a similar product, and so it is difficult to know what will be used, what will be needed, what will be frustrating, and what will be ignored. The Netpli ance team started to gather information for their device by focusing on existing data about PC users: demographics, usability studies, areas of dissatis faction, etc. They employed marketing research, focus groups, and user surveys to identify the key features of the appliance, and concentrated on de livering these fundamentals well. The team was multidisciplinary and included

Design Analysis

Analys is signed

Plannin g

Pla nn in

first,

hardware engineers, user interface designers, mar keting specialists, test specialists, industrial design ers, and visual designers. Users were involved throughout development and the whole team took an active part in the design. The interface was de to meet user requirements, and then

the hardware and software were developed to fit the interface. In all of this, the emphasis was on a lean development process with a minimum of doc umentation, early prototyping, and frequent itera g tions for each component. For example, the design

appliance’s use to achieve a task. These helped de velopers to understand how the product could be used from a user’s perspective. We will return to

similar techniques in Chapter 7.

Figure 6.11

Implementation was achieved through rapid cy cles of implement and test. Small usability tests were conducted throughout implementation to

Figure 6.12

find and fix usability problems. Developers and their families or friends were encouraged to use the appliance so that designers could enjoy the same experience as the users (called “eating your own dogfood”!). For these field tests, the product

The news channel as part of the categorical content.

was instrumented so that the team could monitor

how often each function was used. This data helped to prioritize the development of features as the product release deadline approached.

6.4.3 Lifecycle models in HCI

Another of the traditions from which interaction design has emerged is the field of HCI (human–computer interaction). Fewer lifecycle models have arisen from this field than from software engineering and, as you would expect, they have a stronger tradition of user focus. We describe two of these here. The first one, the Star, was derived from empirical work on understanding how designers tackled HCI design problems. This represents a very flexible process with evaluation at its core. In contrast, the second one, the usability engineering lifecycle, shows a more structured approach and hails from the usability engineering tradition.

The Star Lifecycle Model

About the same time that those involved in software engineering were looking for alternatives to the waterfall lifecycle, so too were people involved in HCI looking for alternative ways to support the design of interfaces. In 1989, the Star lifecycle

implementation

prototyping

evaluation
task analysis/
functional analysis
requirements/
specification

conceptual design/ formal design representation

Figure 6.13

model was proposed by Hartson and Hix (1989) (see Figure 6.13). This emerged from some empirical work they did looking at how interface designers went about their work. They identified two different modes of activity: analytic mode and syn thetic mode. The former is characterized by such notions as top-down, organizing, judicial, and formal, working from the systems view towards the user’s view; the latter is characterized by such notions as bottom-up, free-thinking, creative and ad hoc, working from the user’s view towards the systems view. Interface designers move from one mode to another when designing. A similar behavior has been ob served in software designers (Guindon, 1990).

Unlike the lifecycle models introduced above, the Star lifecycle does not specify any ordering of activities. In fact, the activities are highly interconnected: you can move from any activity to any other, provided you first go through the evaluation activity. This reflects the findings of the empirical studies. Evaluation is central to this model, and whenever an activity is completed, its result(s) must be evaluated. So a project may start with requirements gathering, or it may start with evaluating an existing situation, or by analyzing existing tasks, and so on.

ACTIVITY 6.6

Comment

The Star lifecycle model has not been used widely and successfully for large projects in indus try. Consider the benefits of lifecycle models introduced above and suggest why this may be.

One reason may be that the Star lifecycle model is extremely flexible. This may be how de signers work in practice, but as we commented above, lifecycle models are popular because “they allow developers, and particularly managers, to get an overall view of the develop ment effort so that progress can be tracked, deliverables specified, resources allocated, tar gets set, and so on.” With a model as flexible as the Star lifecycle, it is difficult to control these issues without substantially changing the model itself.

The Usability Engineering Lifecycle

The Usability Engineering Lifecycle was proposed by Deborah Mayhew in 1999 (Mayhew, 1999). Many people have written about usability engineering, and as

OOSE: Design Model/ Implementation Model

Figure 6.14

NO All Functionality

Addressed?

YES

A

A

INSTALLATION

Installation

User Feedback

All Issues Resolved? YES DONE

NO

Enhancements

ACTIVITY 6.7

Comment

(e.g. websites)

Figure 6.14

Mayhew herself says, “I did not invent the concept of a Usability Engineering Life cycle. Nor did I invent any of the Usability Engineering tasks included in the lifecy cle . . . .”. However, what her lifecycle does provide is a holistic view of usability engineering and a detailed description of how to perform usability tasks, and it specifies how usability tasks can be integrated into traditional software develop ment lifecycles. It is therefore particularly helpful for those with little or no exper tise in usability to see how the tasks may be performed alongside more traditional software engineering activities. For example, Mayhew has linked the stages with a general development approach (rapid prototyping) and a specific method (object oriented software engineering (OOSE, Jacobson et al, 1992)) that have arisen from software engineering. The lifecycle itself has essentially three tasks: requirements analysis, design/ testing/development, and installation, with the middle stage being the largest and involving many subtasks (see Figure 6.14). Note the production of a set of usability goals in the first task. Mayhew suggests that these goals be captured in a style guide that is then used throughout the project to help ensure that the usability goals are adhered to. This lifecycle follows a similar thread to our interaction design model but in cludes considerably more detail. It includes stages of identifying requirements, de signing, evaluating, and building prototypes. It also explicitly includes the style guide as a mechanism for capturing and disseminating the usability goals of the project. Recognizing that some projects will not require the level of structure pre sented in the full lifecycle, Mayhew suggests that some substeps can be skipped if they are unnecessarily complex for the system being developed.

Study the usability engineering lifecycle and identify how this model differs from our inter action design model described in Section 6.4.1, in terms of the iterations it supports.

One of the main differences between Mayhew’s model and ours is that in the former the it eration between design and evaluation is contained within the second phase. Iteration be tween the design/test/development phase and the requirements analysis phase occurs only after the conceptual model and the detailed designs have been developed, prototyped, and

Assignment

Summary

evaluated one at a time. Our version models a return to the activity of identifying needs and establishing requirements after evaluating any element of the design.

Nowadays, timepieces (such as clocks, wristwatches etc) have a variety of functions. They not only tell the time and date but they can speak to you, remind you when it’s time to do some thing, and provide a light in the dark, among other things. Mostly, the interface for these de vices, however, shows the time in one of two basic ways: as a digital number such as 23:40 or through an analog display with two or three hands—one to represent the hour, one for the minutes, and one for the seconds. In this assignment, we want you to design an innovative timepiece for your own use. This could be in the form of a wristwatch, a mantelpiece clock, an electronic clock, or any other kind of clock you fancy. Your goal is to be inventive and exploratory. We have broken this as signment down into the following steps to make it clearer:

  1. Think about the interactive product you are designing: what do you want it to do for you? Find 3–5 potential users and ask them what they would want. Write a list
  2. Look around for similar devices and seek out other sources of inspiration that you might find helpful. Make a note of any findings that are interesting, useful or in- sightful.
  3. Sketch out some initial designs for the clock. Try to develop at least two distinct al- ternatives that both meet your set of requirements.
  4. Evaluate the two designs, using your usability criteria and by role playing an interac- tion with your sketches. Involve potential users in the evaluation, if possible. Does it do what you want? Is the time or other information being displayed always clear?
  5. Design is iterative, so you may want to return to earlier elements of the process be- fore you choose one of your alternatives.

In this chapter, we have looked at the process of interaction design, i.e., what activities are required in order to design an interactive product, and how lifecycle models show the rela tionships between these activities. A simple interaction design model consisting of four ac tivities was introduced and issues surrounding the identification of users, generating alternative designs, and evaluating designs were discussed. Some lifecycle models from soft ware engineering and HCI were introduced.

Key points

• The interaction design process consists of four basic activities: identifying needs and es tablishing requirements, developing alternative designs that meet those requirements, building interactive versions of the designs so that they can be communicated and as sessed, and evaluating them.

Further reading

Further reading

RUDISILL, M., LEWIS, C., POLSON, P. B., AND MCKAY, T. D.

(1995) (eds.) Human-Computer Interface Design: Success Stories, Emerging Methods, Real-World Context. San Fran cisco: Morgan Kaufmann. This collection of papers describes the application of different approaches to interface design. Included here is an account of the Xerox Star development, some advice on how to choose among methods, and some practical examples of real-world developments.

BERGMAN, ERIC (2000) (ed.) Information Appliances and Be yond. San Francisco: Morgan Kaufmann. This book is an edited collection of papers which report on the experience of designing and building a variety of ‘information appliances’, i.e., purpose-built computer-based products which perform a specific task. For example, the Palm Pilot, mobile telephones, a vehicle navigation system, and interactive toys for children.

MAYHEW, DEBORAH J. (1999) The Usability Engineering Lifecycle. San Francisco: Morgan Kaufmann. This is a very

practical book about product user interface design. It ex plains how to perform usability tasks throughout develop ment and provides useful examples along the way to illustrate the techniques. It links in with two software devel opment based methods: rapid prototyping and object-ori ented software engineering.

SOMMERVILLE, IAN (2001) Software Engineering (6th edi tion). Harlow, UK: Addison-Wesley. If you are interested in pursuing the software engineering aspects of the lifecycle models section, then this book provides a useful overview of the main models and their purpose.

NIELSEN, JAKOB (1993) Usability Engineering. San Fran cisco: Morgan Kaufmann. This is a seminal book on usability engineering. If you want to find out more about the philoso phy, intent, history, or pragmatics of usability engineering, then this is a good place to start.

Prior to this, she was at the Royal College of Art where she started and directed the Computer Related Design Department, developing a program to enable artist-designers to develop and apply their traditional skills and knowledge to the design of all kinds of interactive products and systems.

GC: I believe that things should work but they should also delight. In the past, when it was really dif ficult to make things work, that was what people con centrated on. But now it’s much easier to make software and much easier to make hardware. We’ve got a load of technologies but they’re still often not designed for people—and they’re certainly not very enjoyable to use. If we think about other things in our life, our clothes, our furniture, the things we eat with, we choose what we use because they have a meaning beyond their practical use. Good design is partly about working really well, but it’s also about what something looks like, what it reminds us of, what it refers to in our broader cultural environment. It’s this side that interactive systems haven’t really addressed yet. They’re only just beginning to become part of culture. They are not just a tool for professionals any more, but an environment in which we live. HS: How do you think we can improve things? GC: The parallel with architecture is quite an inter esting one. In architecture, a great deal of time and expense is put into the initial design; I don’t think very much money or time is put into the initial design of software. If you think of the big software engineer ing companies, how many people work in the design side rather than on the implementation side? HS: When you say design do you mean conceptual design, or task design, or something else? GC: I mean all phases of design. Firstly there’s re search—finding out about people. This is not neces sarily limited to finding out about what they want necessarily, because if we’re designing new things, they are probably things people don’t even know they

could have. At the Royal College of Art we tried to work with users, but to be inspired by them, and not constrained by what they know is possible. The second stage is thinking, “What should this thing we are designing do?” You could call that con ceptual design. Then a third stage is thinking how do you represent it, how do you give it form? And then the fourth stage is actually crafting the interface—ex actly what color is this pixel? Is this type the right size, or do you need a size bigger? How much can you get on a screen?—all those things about the details. One of the problems companies have is that the feedback they get is. “I wish it did x.” Software looks as if it’s designed, not with a basic model of how it works that is then expressed on the interface, but as a load of different functions that are strung together. The desktop interface, although it has great advan tages, encourages the idea that you have a menu and you can just add a few more bits when people want more things. In today’s word processors, for instance, there isn’t a clear conceptual model about how it works, or an underlying theory people can use to rea son about why it is not working in the way they expect.

HS: So in trying to put more effort into the design as pect of things, do you think we need different people in the team? GC: Yes. People in the software field tend to think that designers are people who know how to give the product form, which of course is one of the things they do. But a graphic designer, for instance, is somebody who also thinks at a more strategic level, “What is the message that these people want to get over and to whom?” and then, “What is the best way to give form to a message like that?” The part you see is the beautiful design, the lovely poster or record sleeve, or elegant book, but be hind that is a lot of thinking about how to communicate ideas via a particular medium. HS: If you’ve got people from different disciplines, have you experienced difficulties in communication? GC: Absolutely. I think that people from different disciplines have different values, so different results and different approaches are valued. People have dif ferent temperaments, too, that have led them to the different fields in the first place, and they’ve been trained in different ways. In my view the big differ

ence between the way engineers are trained and the way designers are trained is that engineers are trained to focus in on a solution from the beginning whereas designers are trained to focus out to begin with and then focus in. They focus out and try lots of different alternatives, and they pick some and try them out to see how they go. Then they refine down. This is very hard for both the engineers and the designers because the designers are thinking the engineers are trying to hone in much too quickly and the engineers can’t bear the designers faffing about. They are trained to get their results in a completely different way.

  1. Is your idea to make each more tolerant of the
  2. ther?
  3. Yes, my idea is not to try to make renaissance people, as I don’t think it’s feasible. Very few people can do everything well. I think the ideal team is made up of people who are really confident and good at what they do and open-mined enough to realize there are very different approaches. There’s the scientific ap- proach, the engineering approach, the design approach. All three are different and that’s their value—you don’t want everybody to be the same. The best combi- nation is where you have engineers who understand design and designers who understand engineering. It’s important that people know their limitations too. If you realize that you need an ergonomist, then you go and find one and you hire them to consult for you. So you need to know what you don’t know as well as what you do.
  4. What other aspects of traditional design do you think help with interaction design?
  5. I think the ability to visualize things. It allows people to make quick prototypes or models or sketches so that a group of people can talk about something concrete. I think that’s invaluable in the process. I think also making things that people like is just one of the things that good designers have a feel for.
  6. Do you mean aesthetically like or like in its whole sense?
  7. In its whole sense. Obviously there’s the aes- thetic of what something looks like or feels like but

Interview 199

there’s also the aesthetic of how it works as well. You can talk about an elegant way of doing something as well as an elegant look.

HS: Another trait I’ve seen in designers is being pro tective of their design. GC: I think that is both a vice and a virtue. In order to keep a design coherent you need to keep a grip on the whole and to push it through as a whole. Other wise it can happen that people try to make this a bit smaller and cut bits out of that, and so on, and before you know where you are the coherence of the design is lost. It is quite difficult for a team to hold a coher ent vision of a design. If you think of other design fields, like film-making, for instance, there is one di rector and everybody accepts that it’s the director’s vision. One of the things that’s wrong with products like Microsoft Word, for instance, is that there’s no coherent idea in it that makes you think, “Oh yes, I understand how this fits with that.” Design is always a balance between things that work well and things that look good, and the ideal de sign satisfies everything, but in most designs you have to make trade-offs. If you’re making a game it’s more important that people enjoy it and that it looks good than to worry if some of it’s a bit difficult. If you’re making a fighter cockpit then the most important thing is that pilots don’t fall out of the sky, and so this informs the trade-offs you make. The question is, who decides how to decide the criteria for the tradeoffs that inevitably need to be made. This is not a matter of engineering: it’s a matter of values—cultural, emo tional, aesthetic. HS: I know this is a controversial issue for some de signers. Do you think users should be part of the de sign team? GC: No, I don’t. I think it’s an abdication of re sponsibility. Users should definitely be involved as a source of inspiration, suggesting ideas, evaluating proposals—saying, “Yes, we think this would be great” or “No, we think this is an appalling idea.” But in the end, if designers aren’t better than the general public at designing things, what are they doing as designers?

THE DESIGN OF EVERYDAY THINGS

HUMAN ERROR? NO, BAD DESIGN

Most industrial accidents are caused by human error: estimates range between 75 and 95 percent. How is it that so many people are so incompetent? Answer: They aren’t. It’s a design problem.

If the number of accidents blamed upon human error were 1 to 5 percent, I might believe that people were at fault. But when the percentage is so high, then clearly other factors must be involved. When something happens this frequently, there must be another underlying factor.

When a bridge collapses, we analyze the incident to find the causes of the collapse and reformulate the design rules to ensure that form of accident will never happen again. When we discover that electronic equipment is malfunctioning because it is responding to unavoidable electrical noise, we redesign the circuits to be more tolerant of the noise. But when an accident is thought to be caused by people, we blame them and then continue to do things just as we have always done.

Physical limitations are well understood by designers; mental limitations are greatly misunderstood. We should treat all failures in the same way: find the fundamental causes and redesign the system so that these can no longer lead to problems. We design

162

equipment that requires people to be fully alert and attentive for hours, or to remember archaic, confusing procedures even if they are only used infrequently, sometimes only once in a lifetime. We put people in boring environments with nothing to do for hours on end, until suddenly they must respond quickly and accurately. Or we subject them to complex, high-workload environments, where they are continually interrupted while having to do multiple tasks simultaneously. Then we wonder why there is failure. Even worse is that when I talk to the designers and administra tors of these systems, they admit that they too have nodded off while supposedly working. Some even admit to falling asleep for an instant while driving. They admit to turning the wrong stove burners on or off in their homes, and to other small but signifi cant errors. Yet when their workers do this, they blame them for “human error.” And when employees or customers have similar issues, they are blamed for not following the directions properly, or for not being fully alert and attentive.

Understanding Why There Is Error

Error occurs for many reasons. The most common is in the nature of the tasks and procedures that require people to behave in un natural ways—staying alert for hours at a time, providing precise, accurate control specifications, all the while multitasking, doing several things at once, and subjected to multiple interfering activ ities. Interruptions are a common reason for error, not helped by designs and procedures that assume full, dedicated attention yet that do not make it easy to resume operations after an interruption. And finally, perhaps the worst culprit of all, is the attitude of peo ple toward errors.

When an error causes a financial loss or, worse, leads to an injury or death, a special committee is convened to investigate the cause and, almost without fail, guilty people are found. The next step is to blame and punish them with a monetary fine, or by firing or jailing them. Sometimes a lesser punishment is proclaimed: make the guilty parties go through more training. Blame and punish; blame and train. The investigations and resulting punishments feel good: “We caught the culprit.” But it doesn’t cure the problem: the same error will occur over and over again. Instead, when an error happens, we should determine why, then redesign the product or the procedures being followed so that it will never occur again or, if it does, so that it will have minimal impact.

ROOT CAUSE ANALYSIS

Root cause analysis is the name of the game: investigate the acci dent until the single, underlying cause is found. What this ought to mean is that when people have indeed made erroneous decisions or actions, we should determine what caused them to err. This is what root cause analysis ought to be about. Alas, all too often it stops once a person is found to have acted inappropriately. Trying to find the cause of an accident sounds good but it is flawed for two reasons. First, most accidents do not have a single cause: there are usually multiple things that went wrong, multiple events that, had any one of them not occurred, would have pre vented the accident. This is what James Reason, the noted British authority on human error, has called the “Swiss cheese model of accidents” (shown in Figure 5.3 of this chapter on page 208, and discussed in more detail there).

Second, why does the root cause analysis stop as soon as a hu man error is found? If a machine stops working, we don’t stop the analysis when we discover a broken part. Instead, we ask: “Why did the part break? Was it an inferior part? Were the required spec ifications too low? Did something apply too high a load on the part?” We keep asking questions until we are satisfied that we understand the reasons for the failure: then we set out to remedy them. We should do the same thing when we find human error: We should discover what led to the error. When root cause analysis discovers a human error in the chain, its work has just begun: now we apply the analysis to understand why the error occurred, and what can be done to prevent it.

One of the most sophisticated airplanes in the world is the US Air Force’s F-22. However, it has been involved in a number of accidents, and pilots have complained that they suffered oxygen deprivation (hypoxia). In 2010, a crash destroyed an F-22 and killed the pilot. The Air Force investigation board studied the inci dent and two years later, in 2012, released a report that blamed the accident on pilot error: “failure to recognize and initiate a timely dive recovery due to channelized attention, breakdown of visual scan and unrecognized spatial distortion.”

In 2013, the Inspector General’s office of the US Department of Defense reviewed the Air Force’s findings, disagreeing with the as sessment. In my opinion, this time a proper root cause analysis was done. The Inspector General asked “why sudden incapacitation or unconsciousness was not considered a contributory factor.” The Air Force, to nobody’s surprise, disagreed with the criticism. They ar gued that they had done a thorough review and that their conclu sion “was supported by clear and convincing evidence.” Their only fault was that the report “could have been more clearly written.”

It is only slightly unfair to parody the two reports this way:

Air Force: It was pilot error—the pilot failed to take corrective action. Inspector General: That’s because the pilot was probably unconscious. Air Force: So you agree, the pilot failed to correct the problem.

THE FIVE WHYS

Root cause analysis is intended to determine the underlying cause of an incident, not the proximate cause. The Japanese have long followed a procedure for getting at root causes that they call the “Five Whys,” originally developed by Sakichi Toyoda and used by the Toyota Motor Company as part of the Toyota Production Sys tem for improving quality. Today it is widely deployed. Basically, it means that when searching for the reason, even after you have found one, do not stop: ask why that was the case. And then ask why again. Keep asking until you have uncovered the true under lying causes. Does it take exactly five? No, but calling the proce dure “Five Whys” emphasizes the need to keep going even after a reason has been found. Consider how this might be applied to the analysis of the F-22 crash:

Five Whys Question

Answer

Q1: Why did the plane crash?

Q2: Why didn’t the pilot recover from the dive?

Q3: Why was that?

Because it was in an uncontrolled dive. Because the pilot failed to initiate a timely recovery. Because he might have been unconscious (or oxygen deprived). We don’t know. We need to find out.

Q4: Why was that? Etc.

The Five Whys of this example are only a partial analysis. For example, we need to know why the plane was in a dive (the report explains this, but it is too technical to go into here; suffice it to say that it, too, suggests that the dive was related to a possible oxygen deprivation).

The Five Whys do not guarantee success. The question why is ambiguous and can lead to different answers by different investi gators. There is still a tendency to stop too soon, perhaps when the limit of the investigator’s understanding has been reached. It also tends to emphasize the need to find a single cause for an incident, whereas most complex events have multiple, complex causal fac tors. Nonetheless, it is a powerful technique.

The tendency to stop seeking reasons as soon as a human error has been found is widespread. I once reviewed a number of acci dents in which highly trained workers at an electric utility com pany had been electrocuted when they contacted or came too close to the high-voltage lines they were servicing. All the investigat ing committees found the workers to be at fault, something even the workers (those who had survived) did not dispute. But when the committees were investigating the complex causes of the in cidents, why did they stop once they found a human error? Why didn’t they keep going to find out why the error had occurred, what circumstances had led to it, and then, why those circum stances had happened? The committees never went far enough to find the deeper, root causes of the accidents. Nor did they consider redesigning the systems and procedures to make the incidents either impossible or far less likely. When people err, change the system so that type of error will be reduced or eliminated. When complete elimination is not possible, redesign to reduce the impact. It wasn’t difficult for me to suggest simple changes to procedures that would have prevented most of the incidents at the utility com pany. It had never occurred to the committee to think of this. The problem is that to have followed my recommendations would have meant changing the culture from an attitude among the field workers that “We are supermen: we can solve any problem, repair the most complex outage. We do not make errors.” It is not possi ble to eliminate human error if it is thought of as a personal failure rather than as a sign of poor design of procedures or equipment. My report to the company executives was received politely. I was even thanked. Several years later I contacted a friend at the com pany and asked what changes they had made. “No changes,” he said. “And we are still injuring people.”

One big problem is that the natural tendency to blame someone for an error is even shared by those who made the error, who often agree that it was their fault. People do tend to blame them selves when they do something that, after the fact, seems inex cusable. “I knew better,” is a common comment by those who have erred. But when someone says, “It was my fault, I knew better,” this is not a valid analysis of the problem. That doesn’t help prevent its recurrence. When many people all have the same problem, shouldn’t another cause be found? If the system lets you make the error, it is badly designed. And if the system induces you to make the error, then it is really badly designed. When I turn on the wrong stove burner, it is not due to my lack of knowl edge: it is due to poor mapping between controls and burners. Teaching me the relationship will not stop the error from recur ring: redesigning the stove will.

We can’t fix problems unless people admit they exist. When we blame people, it is then difficult to convince organizations to restructure the design to eliminate these problems. After all, if a person is at fault, replace the person. But seldom is this the case: usually the system, the procedures, and social pressures have led to the problems, and the problems won’t be fixed without address ing all of these factors.

Why do people err? Because the designs focus upon the require ments of the system and the machines, and not upon the re quirements of people. Most machines require precise commands and guidance, forcing people to enter numerical information per fectly. But people aren’t very good at great precision. We frequently make errors when asked to type or write sequences of numbers or letters. This is well known: so why are machines still being de signed that require such great precision, where pressing the wrong key can lead to horrendous results?

People are creative, constructive, exploratory beings. We are par ticularly good at novelty, at creating new ways of doing things, and at seeing new opportunities. Dull, repetitive, precise require ments fight against these traits. We are alert to changes in the en vironment, noticing new things, and then thinking about them and their implications. These are virtues, but they get turned into negative features when we are forced to serve machines. Then we are punished for lapses in attention, for deviating from the tightly prescribed routines.

A major cause of error is time stress. Time is often critical, es pecially in such places as manufacturing or chemical processing plants and hospitals. But even everyday tasks can have time pres sures. Add environmental factors, such as poor weather or heavy traffic, and the time stresses increase. In commercial establish ments, there is strong pressure not to slow the processes, because doing so would inconvenience many, lead to significant loss of money, and, in a hospital, possibly decrease the quality of patient care. There is a lot of pressure to push ahead with the work even when an outside observer would say it was dangerous to do so. In many industries, if the operators actually obeyed all the proce dures, the work would never get done. So we push the boundaries: we stay up far longer than is natural. We try to do too many tasks at the same time. We drive faster than is safe. Most of the time we manage okay. We might even be rewarded and praised for our he roic efforts. But when things go wrong and we fail, then this same behavior is blamed and punished.

Deliberate Violations

Errors are not the only type of human failures. Sometimes peo ple knowingly take risks. When the outcome is positive, they are often rewarded. When the result is negative, they might be pun ished. But how do we classify these deliberate violations of known, proper behavior? In the error literature, they tend to be ignored. In the accident literature, they are an important component. Deliberate deviations play an important role in many accidents. They are defined as cases where people intentionally violate pro cedures and regulations. Why do they happen? Well, almost every one of us has probably deliberately violated laws, rules, or even our own best judgment at times. Ever go faster than the speed limit? Drive too fast in the snow or rain? Agree to do some hazard ous act, even while privately thinking it foolhardy to do so? In many industries, the rules are written more with a goal toward legal compliance than with an understanding of the work require ments. As a result, if workers followed the rules, they couldn’t get their jobs done. Do you sometimes prop open locked doors? Drive with too little sleep? Work with co-workers even though you are ill (and might therefore be infectious)?

Routine violations occur when noncompliance is so frequent that it is ignored. Situational violations occur when there are special cir cumstances (example: going through a red light “because no other cars were visible and I was late”). In some cases, the only way to complete a job might be to violate a rule or procedure. A major cause of violations is inappropriate rules or procedures that not only invite violation but encourage it. Without the viola tions, the work could not be done. Worse, when employees feel it necessary to violate the rules in order to get the job done and, as a result, succeed, they will probably be congratulated and rewarded. This, of course, unwittingly rewards noncompliance. Cultures that encourage and commend violations set poor role models.

Although violations are a form of error, these are organizational and societal errors, important but outside the scope of the design of everyday things. The human error examined here is unintentional: deliberate violations, by definition, are intentional deviations that are known to be risky, with the potential of doing harm.

Two Types of Errors: Slips and Mistakes

Many years ago, the British psychologist James Reason and I de veloped a general classification of human error. We divided human error into two major categories: slips and mistakes (Figure 5.1). This classification has proved to be of value for both theory and practice. It is widely used in the study of error in such diverse areas as indus trial and aviation accidents, and medical errors. The discussion gets a little technical, so I have kept technicalities to a minimum. This topic is of extreme importance to design, so stick with it.

DEFINITIONS: ERRORS, SLIPS, AND MISTAKES

Human error is defined as any deviance from “appropriate” be havior. The word appropriate is in quotes because in many circum stances, the appropriate behavior is not known or is only deter

Classification of Errors. Errors have two major forms. Slips occur when the goal is correct, but the required actions are not done properly: the exe cution is flawed. Mistakes occur when the goal or plan is wrong. Slips and mistakes can be further divided based upon their under lying causes. Memory lapses can lead to either slips or mistakes, depending upon whether the memory failure was at the highest level of cognition (mistakes) or at lower (subconscious) levels (slips). Although deliberate violations of procedures are clearly inappropri ate behaviors that often lead to ac cidents, these are not considered as errors (see discussion in text).

F IGU RE

5 .1 .

mined after the fact. But still, error is defined as deviance from the generally accepted correct or appropriate behavior. Error is the general term for all wrong actions. There are two ma jor classes of error: slips and mistakes, as shown in Figure 5.1; slips are further divided into two major classes and mistakes into three. These categories of errors all have different implications for design. I now turn to a more detailed look at these classes of errors and their design implications.

SL I P S

A slip occurs when a person intends to do one action and ends up doing something else. With a slip, the action performed is not the same as the action that was intended.

There are two major classes of slips: action-based and memory-lapse. In action-based slips, the wrong action is performed. In lapses, memory fails, so the intended action is not done or its results not evaluated. Action-based slips and memory lapses can be further classified according to their causes.

Example of an action-based slip. I poured some milk into my coffee and then put the coffee cup into the refrigerator. This is the correct action applied to the wrong object. Example of a memory-lapse slip. I forget to turn off the gas burner on

my stove after cooking dinner.

M I S TA K E S

A mistake occurs when the wrong goal is established or the wrong plan is formed. From that point on, even if the actions are executed properly they are part of the error, because the actions themselves are inappropriate—they are part of the wrong plan. With a mistake, the action that is performed matches the plan: it is the plan that is wrong. Mistakes have three major classes: rule-based, knowledge-based, and memory-lapse. In a rule-based mistake, the person has appro priately diagnosed the situation, but then decided upon an er roneous course of action: the wrong rule is being followed. In a knowledge-based mistake, the problem is misdiagnosed because

Example of knowledge-based mistake. Weight of fuel was computed

in pounds instead of kilograms.

Example of memory-lapse mistake. A mechanic failed to complete

troubleshooting because of distraction.

ERROR AND THE SEVEN STAGES OF ACTION

Errors can be understood through reference to the seven stages of the action cycle of Chapter 2 (Figure 5.2). Mistakes are er rors in setting the goal or plan, and in comparing results with expectations—the higher levels of cognition. Slips happen in the execution of a plan, or in the perception or interpretation of the outcome—the lower stages. Memory lapses can happen at any of the eight transitions between stages, shown by the X’s in Figure 5.2B. A memory lapse at one of these transitions stops the action cycle from proceeding, and so the desired action is not completed.

A.

FIGURE . 5.2

B.

Where Slips and Mistakes Originate in the Action Cycle. Figure A

shows that action slips come from the bottom four stages of the action cycle and mis takes from the top three stages. Memory lapses impact the transitions between stages (shown by the X’s in Figure B). Memory lapses at the higher levels lead to mistakes, and lapses at the lower levels lead to slips.

Slips are the result of subconscious actions getting waylaid en route. Mistakes result from conscious deliberations. The same pro cesses that make us creative and insightful by allowing us to see relationships between apparently unrelated things, that let us leap to correct conclusions on the basis of partial or even faulty evi dence, also lead to mistakes. Our ability to generalize from small amounts of information helps tremendously in new situations; but sometimes we generalize too rapidly, classifying a new situation as similar to an old one when, in fact, there are significant discrep ancies. This leads to mistakes that can be difficult to discover, let alone eliminate.

The Classification of Slips

A colleague reported that he went to his car to drive to work. As he drove away, he realized that he had forgotten his briefcase, so he turned around and went back. He stopped the car, turned off the engine, and unbuckled his wristwatch. Yes, his wristwatch, instead of his seatbelt.

The story illustrates both a memory-lapse slip and an action slip. The forgetting of the briefcase is a memory-lapse slip. The unbuck ling of the wristwatch is an action slip, in this case a combination of description-similarity and capture error (described later in this chapter).

Most everyday errors are slips. Intending to do one action, you find yourself doing another. When a person says something clearly and distinctly to you, you “hear” something quite different. The study of slips is the study of the psychology of everyday errors— what Freud called “the psychopathology of everyday life.” Freud believed that slips have hidden, dark meanings, but most are ac counted for by rather simple mental mechanisms.

An interesting property of slips is that, paradoxically, they tend to occur more frequently to skilled people than to novices. Why? Because slips often result from a lack of attention to the task. Skilled people—experts—tend to perform tasks automatically, un der subconscious control. Novices have to pay considerable con scious attention, resulting in a relatively low occurrence of slips.

Some slips result from the similarities of actions. Or an event in the world may automatically trigger an action. Sometimes our thoughts and actions may remind us of unintended actions, which we then perform. There are numerous different kinds of action slips, categorized by the underlying mechanisms that give rise to them. The three most relevant to design are:

CAPTURE SLIPS

I was using a copying machine, and I was counting the pages. I found myself counting, “1, 2, 3, 4, 5, 6, 7, 8, 9, 10, Jack, Queen, King.” I had been playing cards recently.

The capture slip is defined as the situation where, instead of the desired activity, a more frequently or recently performed one gets done instead: it captures the activity. Capture errors require that part of the action sequences involved in the two activities be iden tical, with one sequence being far more familiar than the other. After doing the identical part, the more frequent or more recent activity continues, and the intended one does not get done. Sel dom, if ever, does the unfamiliar sequence capture the familiar one. All that is needed is a lapse of attention to the desired action at the critical junction when the identical portions of the sequences diverge into the two different activities. Capture errors are, there fore, partial memory-lapse errors. Interestingly, capture errors are more prevalent in experienced skilled people than in beginners, in part because the experienced person has automated the required actions and may not be paying conscious attention when the in tended action deviates from the more frequent one. Designers need to avoid procedures that have identical open ing steps but then diverge. The more experienced the workers, the more likely they are to fall prey to capture. Whenever possible, sequences should be designed to differ from the very start.

DESCRIPTION-SIMILARITY SLIPS

A former student reported that one day he came home from jogging, took off his sweaty shirt, and rolled it up in a ball, intending to throw it in the laundry basket. Instead he threw it in the toilet. (It wasn’t poor aim: the laundry basket and toilet were in different rooms.)

In the slip known as a description-similarity slip, the error is to act upon an item similar to the target. This happens when the de scription of the target is sufficiently vague. Much as we saw in Chapter 3, Figure 3.1, where people had difficulty distinguishing among different images of money because their internal descrip tions did not have sufficient discriminating information, the same thing can happen to us, especially when we are tired, stressed, or overloaded. In the example that opened this section, both the laun dry basket and the toilet bowl are containers, and if the description of the target was sufficiently ambiguous, such as “a large enough container,” the slip could be triggered.

Remember the discussion in Chapter 3 that most objects don’t need precise descriptions, simply enough precision to distinguish the desired target from alternatives. This means that a description that usually suffices may fail when the situation changes so that multiple similar items now match the description. Description similarity errors result in performing the correct action on the wrong object. Obviously, the more the wrong and right objects have in common, the more likely the errors are to occur. Simi larly, the more objects present at the same time, the more likely the error.

Designers need to ensure that controls and displays for differ ent purposes are significantly different from one another. A lineup of identical-looking switches or displays is very apt to lead to description-similarity error. In the design of airplane cockpits, many controls are shape coded so that they both look and feel dif ferent from one another: the throttle levers are different from the flap levers (which might look and feel like a wing flap), which are different from the landing gear control (which might look and feel like a wheel).

MEMORY-LAPSE SLIPS

Errors caused by memory failures are common. Consider these examples:

Memory lapses are common causes of error. They can lead to several kinds of errors: failing to do all of the steps of a procedure; repeating steps; forgetting the outcome of an action; or forgetting the goal or plan, thereby causing the action to be stopped. The immediate cause of most memory-lapse failures is interrup tions, events that intervene between the time an action is decided upon and the time it is completed. Quite often the interference comes from the machines we are using: the many steps required between the start and finish of the operations can overload the ca pacity of short-term or working memory.

There are several ways to combat memory-lapse errors. One is to minimize the number of steps; another, to provide vivid reminders of steps that need to be completed. A superior method is to use the forcing function of Chapter 4. For example, automated teller ma chines often require removal of the bank card before delivering the requested money: this prevents forgetting the bank card, capital izing on the fact that people seldom forget the goal of the activity, in this case the money. With pens, the solution is simply to prevent their removal, perhaps by chaining public pens to the counter. Not all memory-lapse errors lend themselves to simple solutions. In many cases the interruptions come from outside the system, where the designer has no control.

MODE-ERROR SLIPS

A mode error occurs when a device has different states in which the same controls have different meanings: we call these states modes. Mode errors are inevitable in anything that has more pos sible actions than it has controls or displays; that is, the controls mean different things in the different modes. This is unavoidable as we add more and more functions to our devices. Ever turn off the wrong device in your home entertainment sys tem? This happens when one control is used for multiple purposes. In the home, this is simply frustrating. In industry, the confusion that results when operators believe the system to be in one mode, when in reality it is in another, has resulted in serious accidents and loss of life.

It is tempting to save money and space by having a single control serve multiple purposes. Suppose there are ten different functions on a device. Instead of using ten separate knobs or switches— which would take considerable space, add extra cost, and appear intimidatingly complex, why not use just two controls, one to select the function, the other to set the function to the desired condition? Although the resulting design appears quite simple and easy to use, this apparent simplicity masks the underlying complexity of use. The operator must always be completely aware of the mode, of what function is active. Alas, the prevalence of mode errors shows this assumption to be false. Yes, if I select a mode and then imme diately adjust the parameters, I am not apt to be confused about the state. But what if I select the mode and then get interrupted by other events? Or if the mode is maintained for considerable periods? Or, as in the case of the Airbus accident discussed be low, the two modes being selected are very similar in control and function, but have different operating characteristics, which means that the resulting mode error is difficult to discover? Sometimes the use of modes is justifiable, such as the need to put many controls and displays in a small, restricted space, but whatever the reason, modes are a common cause of confusion and error.

Alarm clocks often use the same controls and display for setting the time of day and the time the alarm should go off, and many of us have thereby set one when we meant the other. Similarly, when time is displayed on a twelve-hour scale, it is easy to set the alarm to go off at seven a.m. only later to discover that the alarm had been set for seven p.m. The use of “a.m.” and “p.m.” to distin guish times before and after noon is a common source of confu sion and error, hence the common use of 24-hour time specification throughout most of the world (the major exceptions being North America, Australia, India, and the Philippines). Watches with mul tiple functions have similar problems, in this case required because of the small amount of space available for controls and displays. Modes exist in most computer programs, in our cell phones, and in the automatic controls of commercial aircraft. A number of se rious accidents in commercial aviation can be attributed to mode errors, especially in aircraft that use automatic systems (which have a large number of complex modes). As automobiles become more complex, with the dashboard controls for driving, heating and air-conditioning, entertainment, and navigation, modes are increasingly common.

An accident with an Airbus airplane illustrates the problem. The flight control equipment (often referred to as the automatic pilot) had two modes, one for controlling vertical speed, the other for controlling the flight path’s angle of descent. In one case, when the pilots were attempting to land, the pilots thought that they were controlling the angle of descent, whereas they had accidentally selected the mode that controlled speed of descent. The number (–3.3) that was entered into the system to represent an appropriate angle (–3.3º) was too steep a rate of descent when interpreted as vertical speed (–3,300 feet/minute: –3.3º would only be –800 feet/ minute). This mode confusion contributed to the resulting fatal ac cident. After a detailed study of the accident, Airbus changed the display on the instrument so that vertical speed would always be displayed with a four-digit number and angle with two digits, thus reducing the chance of confusion.

Mode error is really design error. Mode errors are especially likely where the equipment does not make the mode visible, so the user is expected to remember what mode has been established, sometimes hours earlier, during which time many intervening events might have occurred. Designers must try to avoid modes, but if they are necessary, the equipment must make it obvious which mode is invoked. Once again, designers must always com pensate for interfering activities.

The Classification of Mistakes

Mistakes result from the choice of inappropriate goals and plans or from faulty comparison of the outcome with the goals during eval uation. In mistakes, a person makes a poor decision, misclassifies a situation, or fails to take all the relevant factors into account. Many mistakes arise from the vagaries of human thought, often because people tend to rely upon remembered experiences rather than on more systematic analysis. We make decisions based upon what is in our memory. But as discussed in Chapter 3, retrieval from long term memory is actually a reconstruction rather than an accurate record. As a result, it is subject to numerous biases. Among other things, our memories tend to be biased toward overgeneralization of the commonplace and overemphasis of the discrepant. The Danish engineer Jens Rasmussen distinguished among three modes of behavior: skill-based, rule-based, and knowledge-based. This three-level classification scheme provides a practical tool that has found wide acceptance in applied areas, such as the design of many industrial systems. Skill-based behavior occurs when work ers are extremely expert at their jobs, so they can do the everyday, routine tasks with little or no thought or conscious attention. The most common form of errors in skill-based behavior is slips. Rule-based behavior occurs when the normal routine is no lon ger applicable but the new situation is one that is known, so there is already a well-prescribed course of action: a rule. Rules simply might be learned behaviors from previous experiences, but in cludes formal procedures prescribed in courses and manuals, usu ally in the form of “if-then” statements, such as, “If the engine will not start, then do [the appropriate action].” Errors with rule-based behavior can be either a mistake or a slip. If the wrong rule is se lected, this would be a mistake. If the error occurs during the exe cution of the rule, it is most likely a slip.

Knowledge-based procedures occur when unfamiliar events oc cur, where neither existing skills nor rules apply. In this case, there must be considerable reasoning and problem-solving. Plans might be developed, tested, and then used or modified. Here, conceptual models are essential in guiding development of the plan and inter pretation of the situation.

In both rule-based and knowledge-based situations, the most seri ous mistakes occur when the situation is misdiagnosed. As a result, an inappropriate rule is executed, or in the case of knowledge-based problems, the effort is addressed to solving the wrong problem. In addition, with misdiagnosis of the problem comes misinterpreta tion of the environment, as well as faulty comparisons of the cur rent state with expectations. These kinds of mistakes can be very difficult to detect and correct.

RULE-BASED MISTAKES

When new procedures have to be invoked or when simple prob lems arise, we can characterize the actions of skilled people as rule based. Some rules come from experience; others are formal proce dures in manuals or rulebooks, or even less formal guides, such as cookbooks for food preparation. In either case, all we must do is identify the situation, select the proper rule, and then follow it.

When driving, behavior follows well-learned rules. Is the light red? If so, stop the car. Wish to turn left? Signal the intention to turn and move as far left as legally permitted: slow the vehicle and wait for a safe break in traffic, all the while following the traffic rules and relevant signs and lights.

Rule-based mistakes occur in multiple ways:

Example 1: In 2013, at the Kiss nightclub in Santa Maria, Brazil, pyro technics used by the band ignited a fire that killed over 230 people. The tragedy illustrates several mistakes. The band made a knowl edge-based mistake when they used outdoor flares, which ignited the ceiling’s acoustic tiles. The band thought the flares were safe. Many people rushed into the rest rooms, mistakenly thinking they were ex its: they died. Early reports suggested that the guards, unaware of the fire, at first mistakenly blocked people from leaving the building. Why? Because nightclub attendees would sometimes leave without paying for their drinks. The mistake was in devising a rule that did not take account of emergencies. A root cause analysis would reveal that the goal was to prevent inappropriate exit but still allow the doors to be used in an emergency. One solution is doors that trigger alarms when used, deterring people trying to sneak out, but allowing exit when needed. Example 2: Turning the thermostat of an oven to its maximum tempera ture to get it to the proper cooking temperature faster is a mistake based upon a false conceptual model of the way the oven works. If the person wanders off and forgets to come back and check the oven

temperature after a reasonable period (a memory-lapse slip), the im proper high setting of the oven temperature can lead to an accident, possibly a fire. Example 3: A driver, unaccustomed to anti-lock brakes, encounters an unexpected object in the road on a wet, rainy day. The driver ap plies full force to the brakes but the car skids, triggering the anti-lock brakes to rapidly turn the brakes on and off, as they are designed to do. The driver, feeling the vibrations, believes that it indicates mal function and therefore lifts his foot off the brake pedal. In fact, the vibration is a signal that anti-lock brakes are working properly. The driver’s misevaluation leads to the wrong behavior.

Rule-based mistakes are difficult to avoid and then difficult to detect. Once the situation has been classified, the selection of the appropriate rule is often straightforward. But what if the classifica tion of the situation is wrong? This is difficult to discover because there is usually considerable evidence to support the erroneous classification of the situation and the choice of rule. In complex situations, the problem is too much information: information that both supports the decision and also contradicts it. In the face of time pressures to make a decision, it is difficult to know which evidence to consider, which to reject. People usually decide by tak ing the current situation and matching it with something that hap pened earlier. Although human memory is quite good at matching examples from the past with the present situation, this doesn’t mean that the matching is accurate or appropriate. The matching is biased by recency, regularity, and uniqueness. Recent events are remembered far better than less recent ones. Frequent events are remembered through their regularities, and unique events are remembered because of their uniqueness. But suppose the current event is different from all that has been experienced before: people are still apt to find some match in memory to use as a guide. The same powers that make us so good at dealing with the common and the unique lead to severe error with novel events.

What is a designer to do? Provide as much guidance as possible to ensure that the current state of things is displayed in a coherent and easily interpreted format—ideally graphical. This is a difficult problem. All major decision makers worry about the complexity of real-world events, where the problem is often too much infor mation, much of it contradictory. Often, decisions must be made quickly. Sometimes it isn’t even clear that there is an incident or that a decision is actually being made.

Think of it like this. In your home, there are probably a number of broken or misbehaving items. There might be some burnt-out lights, or (in my home) a reading light that works fine for a little while, then goes out: we have to walk over and wiggle the fluo rescent bulb. There might be a leaky faucet or other minor faults that you know about but are postponing action to remedy. Now consider a major process-control manufacturing plant (an oil refin ery, a chemical plant, or a nuclear power plant). These have thou sands, perhaps tens of thousands, of valves and gauges, displays and controls, and so on. Even the best of plants always has some faulty parts. The maintenance crews always have a list of items to take care of. With all the alarms that trigger when a problem arises, even though it might be minor, and all the everyday failures, how does one know which might be a significant indicator of a major problem? Every single one usually has a simple, rational explana tion, so not making it an urgent item is a sensible decision. In fact, the maintenance crew simply adds it to a list. Most of the time, this is the correct decision. The one time in a thousand (or even, one time in a million) that the decision is wrong makes it the one they will be blamed for: how could they have missed such obvious signals?

Hindsight is always superior to foresight. When the accident in vestigation committee reviews the event that contributed to the problem, they know what actually happened, so it is easy for them to pick out which information was relevant, which was not. This is retrospective decision making. But when the incident was taking place, the people were probably overwhelmed with far too much irrelevant information and probably not a lot of relevant infor mation. How were they to know which to attend to and which to ignore? Most of the time, experienced operators get things right. The one time they fail, the retrospective analysis is apt to condemn them for missing the obvious. Well, during the event, nothing may be obvious. I return to this topic later in the chapter.

You will face this while driving, while handling your finances, and while just going through your daily life. Most of the unusual incidents you read about are not relevant to you, so you can safely ignore them. Which things should be paid attention to, which should be ignored? Industry faces this problem all the time, as do governments. The intelligence communities are swamped with data. How do they decide which cases are serious? The public hears about their mistakes, but not about the far more frequent cases that they got right or about the times they ignored data as not being meaningful—and were correct to do so.

If every decision had to be questioned, nothing would ever get done. But if decisions are not questioned, there will be major mistakes—rarely, but often of substantial penalty. The design challenge is to present the information about the state of the system (a device, vehicle, plant, or activities being moni tored) in a way that is easy to assimilate and interpret, as well as to provide alternative explanations and interpretations. It is useful to question decisions, but impossible to do so if every action—or failure to act—requires close attention.

This is a difficult problem with no obvious solution.

KNOWLEDGE-BASED MISTAKES

Knowledge-based behavior takes place when the situation is novel enough that there are no skills or rules to cover it. In this case, a new procedure must be devised. Whereas skills and rules are con trolled at the behavioral level of human processing and are there fore subconscious and automatic, knowledge-based behavior is controlled at the reflective level and is slow and conscious. With knowledge-based behavior, people are consciously prob lem solving. They are in an unknown situation and do not have any available skills or rules that apply directly. Knowledge-based behavior is required either when a person encounters an unknown situation, perhaps being asked to use some novel equipment, or even when doing a familiar task and things go wrong, leading to a novel, uninterpretable state.

The best solution to knowledge-based situations is to be found in a good understanding of the situation, which in most cases also translates into an appropriate conceptual model. In complex cases, help is needed, and here is where good cooperative problem-solving skills and tools are required. Sometimes, good procedural manuals (paper or electronic) will do the job, especially if critical observa tions can be used to arrive at the relevant procedures to follow. A more powerful approach is to develop intelligent computer sys tems, using good search and appropriate reasoning techniques (artificial-intelligence decision-making and problem-solving). The difficulties here are in establishing the interaction of the people with the automation: human teams and automated systems have to be thought of as collaborative, cooperative systems. Instead, they are often built by assigning the tasks that machines can do to the ma chines and leaving the humans to do the rest. This usually means that machines do the parts that are easy for people, but when the problems become complex, which is precisely when people could use assistance, that is when the machines usually fail. (I discuss this problem extensively in The Design of Future Things.)

MEMORY-LAPSE MISTAKES

Memory lapses can lead to mistakes if the memory failure leads to forgetting the goal or plan of action. A common cause of the lapse is an interruption that leads to forgetting the evaluation of the cur rent state of the environment. These lead to mistakes, not slips, be cause the goals and plans become wrong. Forgetting earlier evalu ations often means remaking the decision, sometimes erroneously. The design cures for memory-lapse mistakes are the same as for memory-lapse slips: ensure that all the relevant information is con tinuously available. The goals, plans, and current evaluation of the system are of particular importance and should be continually available. Far too many designs eliminate all signs of these items once they have been made or acted upon. Once again, the designer should assume that people will be interrupted during their activities and that they may need assistance in resuming their operations.

Social and Institutional Pressures

A subtle issue that seems to figure in many accidents is social pres sure. Although at first it may not seem relevant to design, it has strong influence on everyday behavior. In industrial settings, social pressures can lead to misinterpretation, mistakes, and accidents. To understand human error, it is essential to understand social pressure. Complex problem-solving is required when one is faced with knowledge-based problems. In some cases, it can take teams of peo ple days to understand what is wrong and the best ways to respond. This is especially true of situations where mistakes have been made in the diagnosis of the problem. Once the mistaken diagnosis is made, all information from then on is interpreted from the wrong point of view. Appropriate reconsiderations might only take place during team turnover, when new people come into the situation with a fresh viewpoint, allowing them to form different interpreta tions of the events. Sometimes just asking one or more of the team members to take a few hours’ break can lead to the same fresh anal ysis (although it is understandably difficult to convince someone who is battling an emergency situation to stop for a few hours). In commercial installations, the pressure to keep systems run ning is immense. Considerable money might be lost if an expen sive system is shut down. Operators are often under pressure not to do this. The result has at times been tragic. Nuclear power plants are kept running longer than is safe. Airplanes have taken off be fore everything was ready and before the pilots had received per mission. One such incident led to the largest accident in aviation history. Although the incident happened in 1977, a long time ago, the lessons learned are still very relevant today. In Tenerife, in the Canary Islands, a KLM Boeing 747 crashed during takeoff into a Pan American 747 that was taxiing on the same runway, killing 583 people. The KLM plane had not received clearance to take off, but the weather was starting to get bad and the crew had already been delayed for too long (even being on the

Canary Islands was a diversion from the scheduled flight—bad weather had prevented their landing at their scheduled destina tion). And the Pan American flight should not have been on the runway, but there was considerable misunderstanding between the pilots and the air traffic controllers. Furthermore, the fog was coming in so thickly that neither plane’s crew could see the other. In the Tenerife disaster, time and economic pressures were acting together with cultural and weather conditions. The Pan American pilots questioned their orders to taxi on the runway, but they con tinued anyway. The first officer of the KLM flight voiced minor objections to the captain, trying to explain that they were not yet cleared for takeoff (but the first officer was very junior to the cap tain, who was one of KLM’s most respected pilots). All in all, a ma jor tragedy occurred due to a complex mixture of social pressures and logical explaining away of discrepant observations.

You may have experienced similar pressure, putting off refuel ing or recharging your car until it was too late and you ran out, sometimes in a truly inconvenient place (this has happened to me). What are the social pressures to cheat on school examinations, or to help others cheat? Or to not report cheating by others? Never underestimate the power of social pressures on behavior, causing otherwise sensible people to do things they know are wrong and possibly dangerous.

When I was in training to do underwater (scuba) diving, our in structor was so concerned about this that he said he would reward anyone who stopped a dive early in favor of safety. People are nor mally buoyant, so they need weights to get them beneath the surface. When the water is cold, the problem is intensified because divers must then wear either wet or dry suits to keep warm, and these suits add buoyancy. Adjusting buoyancy is an important part of the dive, so along with the weights, divers also wear air vests into which they continually add or remove air so that the body is close to neutral buoyancy. (As divers go deeper, increased water pressure compresses the air in their protective suits and lungs, so they become heavier: the divers need to add air to their vests to compensate.)

When divers have gotten into difficulties and needed to get to the surface quickly, or when they were at the surface close to shore but being tossed around by waves, some drowned because they were still being encumbered by their heavy weights. Because the weights are expensive, the divers didn’t want to release them. In addition, if the divers released the weights and then made it back safely, they could never prove that the release of the weights was necessary, so they would feel embarrassed, creating self-induced social pressure. Our instructor was very aware of the resulting re luctance of people to take the critical step of releasing their weights when they weren’t entirely positive it was necessary. To counteract this tendency, he announced that if anyone dropped the weights for safety reasons, he would publicly praise the diver and replace the weights at no cost to the person. This was a very persuasive attempt to overcome social pressures.

Social pressures show up continually. They are usually difficult to document because most people and organizations are reluctant to admit these factors, so even if they are discovered in the process of the accident investigation, the results are often kept hidden from public scrutiny. A major exception is in the study of transportation accidents, where the review boards across the world tend to hold open investigations. The US National Transportation Safety Board (NTSB) is an excellent example of this, and its reports are widely used by many accident investigators and researchers of human er ror (including me).

Another good example of social pressures comes from yet an other airplane incident. In 1982 an Air Florida flight from National Airport, Washington, DC, crashed during takeoff into the Four teenth Street Bridge over the Potomac River, killing seventy-eight people, including four who were on the bridge. The plane should not have taken off because there was ice on the wings, but it had al ready been delayed for over an hour and a half; this and other fac tors, the NTSB reported, “may have predisposed the crew to hurry.” The accident occurred despite the first officer’s attempt to warn the captain, who was flying the airplane (the captain and first officer—sometimes called the copilot—usually alternate flying roles on different legs of a trip). The NTSB report quotes the flight deck recorder’s documenting that “although the first officer ex pressed concern that something ‘was not right’ to the captain four times during the takeoff, the captain took no action to reject the takeoff.” NTSB summarized the causes this way:

The National Transportation Safety Board determines that the probable cause of this accident was the flight crew’s failure to use engine anti ice during ground operation and takeoff, their decision to take off with snow/ice on the airfoil surfaces of the aircraft, and the captain’s failure to reject the takeoff during the early stage when his attention was called to anomalous engine instrument readings. (NTSB, 1982.)

Again we see social pressures coupled with time and economic forces.

Social pressures can be overcome, but they are powerful and per vasive. We drive when drowsy or after drinking, knowing full well the dangers, but talking ourselves into believing that we are ex empt. How can we overcome these kinds of social problems? Good design alone is not sufficient. We need different training; we need to reward safety and put it above economic pressures. It helps if the equipment can make the potential dangers visible and explicit, but this is not always possible. To adequately address social, eco nomic, and cultural pressures and to improve upon company pol icies are the hardest parts of ensuring safe operation and behavior.

CHECKLISTS

Checklists are powerful tools, proven to increase the accuracy of behavior and to reduce error, particularly slips and memory lapses. They are especially important in situations with multiple, complex requirements, and even more so where there are interruptions. With multiple people involved in a task, it is essential that the lines of responsibility be clearly spelled out. It is always better to have two people do checklists together as a team: one to read the instruc tion, the other to execute it. If, instead, a single person executes the checklist and then, later, a second person checks the items, the results are not as robust. The person following the checklist, feel ing confident that any errors would be caught, might do the steps too quickly. But the same bias affects the checker. Confident in the ability of the first person, the checker often does a quick, less than thorough job.

One paradox of groups is that quite often, adding more people to check a task makes it less likely that it will be done right. Why? Well, if you were responsible for checking the correct readings on a row of fifty gauges and displays, but you know that two peo ple before you had checked them and that one or two people who come after you will check your work, you might relax, thinking that you don’t have to be extra careful. After all, with so many people looking, it would be impossible for a problem to exist with out detection. But if everyone thinks the same way, adding more checks can actually increase the chance of error. A collaboratively followed checklist is an effective way to counteract these natural human tendencies.

In commercial aviation, collaboratively followed checklists are widely accepted as essential tools for safety. The checklist is done by two people, usually the two pilots of the airplane (the captain and first officer). In aviation, checklists have proven their worth and are now required in all US commercial flights. But despite the strong evidence confirming their usefulness, many industries still fiercely resist them. It makes people feel that their competence is being questioned. Moreover, when two people are involved, a ju nior person (in aviation, the first officer) is being asked to watch over the action of the senior person. This is a strong violation of the lines of authority in many cultures.

Physicians and other medical professionals have strongly resisted the use of checklists. It is seen as an insult to their professional competence. “Other people might need checklists,” they complain, “but not me.” Too bad. Too err is human: we all are subject to slips and mistakes when under stress, or under time or social pressure, or after being subjected to multiple interruptions, each essential in its own right. It is not a threat to professional competence to be human. Legitimate criticisms of particular checklists are used as an indictment against the concept of checklists. Fortunately, checklists are slowly starting to gain acceptance in medical situations. When senior personnel insist on the use of checklists, it actually enhances their authority and professional status. It took decades for check lists to be accepted in commercial aviation: let us hope that medi cine and other professions will change more rapidly. Designing an effective checklist is difficult. The design needs to be iterative, always being refined, ideally using the human-centered design principles of Chapter 6, continually adjusting the list until it covers the essential items yet is not burdensome to perform. Many people who object to checklists are actually objecting to badly de signed lists: designing a checklist for a complex task is best done by professional designers in conjunction with subject matter experts. Printed checklists have one major flaw: they force the steps to follow a sequential ordering, even where this is not necessary or even possible. With complex tasks, the order in which many oper ations are performed may not matter, as long as they are all com pleted. Sometimes items early in the list cannot be done at the time they are encountered in the checklist. For example, in aviation one of the steps is to check the amount of fuel in the plane. But what if the fueling operation has not yet been completed when this check list item is encountered? Pilots will skip over it, intending to come back to it after the plane has been refueled. This is a clear opportu nity for a memory-lapse error.

In general, it is bad design to impose a sequential structure to task execution unless the task itself requires it. This is one of the ma jor benefits of electronic checklists: they can keep track of skipped items and can ensure that the list will not be marked as complete until all items have been done.

Reporting Error

If errors can be caught, then many of the problems they might lead to can often be avoided. But not all errors are easy to detect. More over, social pressures often make it difficult for people to admit to their own errors (or to report the errors of others). If people report their own errors, they might be fined or punished. Moreover, their friends may make fun of them. If a person reports that someone else made an error, this may lead to severe personal repercussions. Finally, most institutions do not wish to reveal errors made by their staff. Hospitals, courts, police systems, utility companies—all are reluctant to admit to the public that their workers are capable of error. These are all unfortunate attitudes.

The only way to reduce the incidence of errors is to admit their existence, to gather together information about them, and thereby to be able to make the appropriate changes to reduce their occur rence. In the absence of data, it is difficult or impossible to make improvements. Rather than stigmatize those who admit to error, we should thank those who do so and encourage the reporting. We need to make it easier to report errors, for the goal is not to punish, but to determine how it occurred and change things so that it will not happen again.

CASE STUDY: JIDOKA—HOW TOYOTA HANDLES ERROR

The Toyota automobile company has developed an extremely effi cient error-reduction process for manufacturing, widely known as the Toyota Production System. Among its many key principles is a philosophy called Jidoka, which Toyota says is “roughly translated as ‘automation with a human touch.’” If a worker notices some thing wrong, the worker is supposed to report it, sometimes even stopping the entire assembly line if a faulty part is about to pro ceed to the next station. (A special cord, called an andon, stops the assembly line and alerts the expert crew.) Experts converge upon the problem area to determine the cause. “Why did it happen?” “Why was that?” “Why is that the reason?” The philosophy is to ask “Why?” as many times as may be necessary to get to the root cause of the problem and then fix it so it can never occur again. As you might imagine, this can be rather discomforting for the person who found the error. But the report is expected, and when it is discovered that people have failed to report errors, they are punished, all in an attempt to get the workers to be honest.

POKA-YOKE: ERROR PROOFING

Poka-yoke is another Japanese method, this one invented by Shi geo Shingo, one of the Japanese engineers who played a major role in the development of the Toyota Production System. Poka-yoke translates as “error proofing” or “avoiding error.” One of the tech niques of poka-yoke is to add simple fixtures, jigs, or devices to constrain the operations so that they are correct. I practice this my self in my home. One trivial example is a device to help me remem ber which way to turn the key on the many doors in the apartment complex where I live. I went around with a pile of small, circular, green stick-on dots and put them on each door beside its keyhole, with the green dot indicating the direction in which the key needed to be turned: I added signifiers to the doors. Is this a major error? No. But eliminating it has proven to be convenient. (Neighbors have commented on their utility, wondering who put them there.)

In manufacturing facilities, poka-yoke might be a piece of wood to help align a part properly, or perhaps plates designed with asymmetrical screw holes so that the plate could fit in only one po sition. Covering emergency or critical switches with a cover to pre vent accidental triggering is another poka-yoke technique: this is obviously a forcing function. All the poka-yoke techniques involve a combination of the principles discussed in this book: affordances, signifiers, mapping, and constraints, and perhaps most important of all, forcing functions.

NASA’S AVIATION SAFETY REPORTING SYSTEM

US commercial aviation has long had an extremely effective sys tem for encouraging pilots to submit reports of errors. The pro gram has resulted in numerous improvements to aviation safety. It wasn’t easy to establish: pilots had severe self-induced social pressures against admitting to errors. Moreover, to whom would they report them? Certainly not to their employers. Not even to the Federal Aviation Authority (FAA), for then they would probably be punished. The solution was to let the National Aeronautics and Space Administration (NASA) set up a voluntary accident report ing system whereby pilots could submit semi-anonymous reports of errors they had made or observed in others (semi-anonymous because pilots put their name and contact information on the re ports so that NASA could call to request more information). Once NASA personnel had acquired the necessary information, they would detach the contact information from the report and mail it back to the pilot. This meant that NASA no longer knew who had reported the error, which made it impossible for the airline com panies or the FAA (which enforced penalties against errors) to find out who had submitted the report. If the FAA had independently noticed the error and tried to invoke a civil penalty or certificate suspension, the receipt of self-report automatically exempted the pilot from punishment (for minor infractions).

When a sufficient number of similar errors had been collected, NASA would analyze them and issue reports and recommenda tions to the airlines and to the FAA. These reports also helped the pilots realize that their error reports were valuable tools for increasing safety. As with checklists, we need similar systems in the field of medicine, but it has not been easy to set up. NASA is a neutral body, charged with enhancing aviation safety, but has no oversight authority, which helped gain the trust of pilots. There is no comparable institution in medicine: physicians are afraid that self-reported errors might lead them to lose their license or be sub jected to lawsuits. But we can’t eliminate errors unless we know what they are. The medical field is starting to make progress, but it is a difficult technical, political, legal, and social problem.

Detecting Error

Errors do not necessarily lead to harm if they are discovered quickly. The different categories of errors have differing ease of discovery. In general, action slips are relatively easy to discover; mistakes, much more difficult. Action slips are relatively easy to detect because it is usually easy to notice a discrepancy between the intended act and the one that got performed. But this detection can only take place if there is feedback. If the result of the action is not visible, how can the error be detected?

Memory-lapse slips are difficult to detect precisely because there is nothing to see. With a memory slip, the required action is not performed. When no action is done, there is nothing to detect. It is only when the lack of action allows some unwanted event to occur that there is hope of detecting a memory-lapse slip. Mistakes are difficult to detect because there is seldom anything that can signal an inappropriate goal. And once the wrong goal or plan is decided upon, the resulting actions are consistent with that wrong goal, so careful monitoring of the actions not only fails to de tect the erroneous goal, but, because the actions are done correctly, can inappropriately provide added confidence to the decision. Faulty diagnoses of a situation can be surprisingly difficult to detect. You might expect that if the diagnosis was wrong, the ac tions would turn out to be ineffective, so the fault would be discov ered quickly. But misdiagnoses are not random. Usually they are based on considerable knowledge and logic. The misdiagnosis is usually both reasonable and relevant to eliminating the symptoms being observed. As a result, the initial actions are apt to appear ap propriate and helpful. This makes the problem of discovery even more difficult. The actual error might not be discovered for hours or days.

Memory-lapse mistakes are especially difficult to detect. Just as with a memory-lapse slip the absence of something that should have been done is always more difficult to detect than the presence of something that should not have been done. The difference be tween memory-lapse slips and mistakes is that, in the first case, a single component of a plan is skipped, whereas in the second, the entire plan is forgotten. Which is easier to discover? At this point I must retreat to the standard answer science likes to give to ques tions of this sort: “It all depends.”

EXPLAINING AWAY MISTAKES

Mistakes can take a long time to be discovered. Hear a noise that sounds like a pistol shot and think: “Must be a car’s exhaust back firing.” Hear someone yell outside and think: “Why can’t my neighbors be quiet?” Are we correct in dismissing these incidents? Most of the time we are, but when we’re not, our explanations can be difficult to justify.

Explaining away errors is a common problem in commercial accidents. Most major accidents are preceded by warning signs: equipment malfunctions or unusual events. Often, there is a series of apparently unrelated breakdowns and errors that culminate in major disaster. Why didn’t anyone notice? Because no single in cident appeared to be serious. Often, the people involved noted each problem but discounted it, finding a logical explanation for the otherwise deviant observation.

T H E C A SE O F T H E W RONG T U R N ON A H IGH WAY

I’ve misinterpreted highway signs, as I’m sure most drivers have. My family was traveling from San Diego to Mammoth Lakes, Cal ifornia, a ski area about 400 miles north. As we drove, we noticed more and more signs advertising the hotels and gambling casinos of Las Vegas, Nevada. “Strange,” we said, “Las Vegas always did advertise a long way off—there is even a billboard in San Diego— but this seems excessive, advertising on the road to Mammoth.” We stopped for gasoline and continued on our journey. Only later, when we tried to find a place to eat supper, did we discover that we had missed a turn nearly two hours earlier, before we had stopped for gasoline, and that we were actually on the road to Las Vegas, not the road to Mammoth. We had to backtrack the entire two hour segment, wasting four hours of driving. It’s humorous now; it wasn’t then.

Once people find an explanation for an apparent anomaly, they tend to believe they can now discount it. But explanations are based on analogy with past experiences, experiences that may not apply to the current situation. In the driving story, the prevalence of billboards for Las Vegas was a signal we should have heeded, but it seemed easily explained. Our experience is typical: some major industrial incidents have resulted from false explanations of anomalous events. But do note: usually these apparent anomalies should be ignored. Most of the time, the explanation for their pres ence is correct. Distinguishing a true anomaly from an apparent one is difficult.

IN HINDSIGHT, EVENTS SEEM LOGICAL

The contrast in our understanding before and after an event can be dramatic. The psychologist Baruch Fischhoff has studied explana tions given in hindsight, where events seem completely obvious and predictable after the fact but completely unpredictable beforehand. Fischhoff presented people with a number of situations and asked them to predict what would happen: they were correct only at the chance level. When the actual outcome was not known by the people being studied, few predicted the actual outcome. He then presented the same situations along with the actual outcomes to another group of people, asking them to state how likely each out come was: when the actual outcome was known, it appeared to be plausible and likely and other outcomes appeared unlikely. Hindsight makes events seem obvious and predictable. Foresight is difficult. During an incident, there are never clear clues. Many things are happening at once: workload is high, emotions and stress levels are high. Many things that are happening will turn out to be irrelevant. Things that appear irrelevant will turn out to be critical. The accident investigators, working with hindsight, knowing what really happened, will focus on the relevant infor mation and ignore the irrelevant. But at the time the events were happening, the operators did not have information that allowed them to distinguish one from the other.

This is why the best accident analyses can take a long time to do. The investigators have to imagine themselves in the shoes of the people who were involved and consider all the information, all the training, and what the history of similar past events would have taught the operators. So, the next time a major accident oc curs, ignore the initial reports from journalists, politicians, and executives who don’t have any substantive information but feel compelled to provide statements anyway. Wait until the official reports come from trusted sources. Unfortunately, this could be months or years after the accident, and the public usually wants answers immediately, even if those answers are wrong. Moreover, when the full story finally appears, newspapers will no longer con sider it news, so they won’t report it. You will have to search for the official report. In the United States, the National Transportation Safety Board (NTSB) can be trusted. NTSB conducts careful inves tigations of all major aviation, automobile and truck, train, ship, and pipeline incidents. (Pipelines? Sure: pipelines transport coal, gas, and oil.)

Designing for Error

It is relatively easy to design for the situation where everything goes well, where people use the device in the way that was in tended, and no unforeseen events occur. The tricky part is to de sign for when things go wrong.

Consider a conversation between two people. Are errors made? Yes, but they are not treated as such. If a person says something that is not understandable, we ask for clarification. If a person says something that we believe to be false, we question and debate. We don’t issue a warning signal. We don’t beep. We don’t give error messages. We ask for more information and engage in mutual dia logue to reach an understanding. In normal conversations between two friends, misstatements are taken as normal, as approximations to what was really meant. Grammatical errors, self-corrections, and restarted phrases are ignored. In fact, they are usually not even detected because we concentrate upon the intended meaning, not the surface features.

Machines are not intelligent enough to determine the meaning of our actions, but even so, they are far less intelligent than they could be. With our products, if we do something inappropriate, if the action fits the proper format for a command, the product does it, even if it is outrageously dangerous. This has led to tragic accidents, especially in health care, where inappropriate design of infusion pumps and X-ray machines allowed extreme overdoses of medication or radiation to be administered to patients, leading to their deaths. In financial institutions, simple keyboard errors have led to huge financial transactions, far beyond normal limits.

Even simple checks for reasonableness would have stopped all of these errors. (This is discussed at the end of the chapter under the heading “Sensibility Checks.”)

Many systems compound the problem by making it easy to err but difficult or impossible to discover error or to recover from it. It should not be possible for one simple error to cause widespread damage. Here is what should be done:

As this chapter demonstrates, we know a lot about errors. Thus, novices are more likely to make mistakes than slips, whereas experts are more likely to make slips. Mistakes often arise from ambiguous or unclear information about the current state of a system, the lack of a good conceptual model, and inappropriate procedures. Recall that most mistakes result from erroneous choice of goal or plan or erroneous evaluation and interpretation. All of these come about through poor information provided by the system about the choice of goals and the means to accomplish them (plans), and poor-quality feedback about what has actually happened.

A major source of error, especially memory-lapse errors, is in terruption. When an activity is interrupted by some other event, the cost of the interruption is far greater than the loss of the time required to deal with the interruption: it is also the cost of resuming the interrupted activity. To resume, it is necessary to remember pre cisely the previous state of the activity: what the goal was, where one was in the action cycle, and the relevant state of the system. Most systems make it difficult to resume after an interruption.

Most discard critical information that is needed by the user to re member the numerous small decisions that had been made, the things that were in the person’s short-term memory, to say noth ing of the current state of the system. What still needs to be done? Maybe I was finished? It is no wonder that many slips and mis takes are the result of interruptions.

Multitasking, whereby we deliberately do several tasks simul taneously, erroneously appears to be an efficient way of getting a lot done. It is much beloved by teenagers and busy workers, but in fact, all the evidence points to severe degradation of performance, increased errors, and a general lack of both quality and efficiency. Doing two tasks at once takes longer than the sum of the times it would take to do each alone. Even as simple and common a task as talking on a hands-free cell phone while driving leads to seri ous degradation of driving skills. One study even showed that cell phone usage during walking led to serious deficits: “Cell phone users walked more slowly, changed directions more frequently, and were less likely to acknowledge other people than individuals in the other conditions. In the second study, we found that cell phone users were less likely to notice an unusual activity along their walking route (a unicycling clown)” (Hyman, Boss, Wise, McKenzie, & Caggiano, 2010).

A large percentage of medical errors are due to interruptions. In aviation, where interruptions were also determined to be a major problem during the critical phases of flying—landing and takeoff—the US Federal Aviation Authority (FAA) requires what it calls a “Sterile Cockpit Configuration,” whereby pilots are not allowed to discuss any topic not directly related to the control of the airplane during these critical periods. In addition, the flight at tendants are not permitted to talk to the pilots during these phases (which has at times led to the opposite error—failure to inform the pilots of emergency situations).

Establishing similar sterile periods would be of great benefit to many professions, including medicine and other safety-critical operations. My wife and I follow this convention in driving: when the driver is entering or leaving a high-speed highway, conversa tion ceases until the transition has been completed. Interruptions and distractions lead to errors, both mistakes and slips. Warning signals are usually not the answer. Consider the control room of a nuclear power plant, the cockpit of a commercial aircraft, or the operating room of a hospital. Each has a large number of different instruments, gauges, and controls, all with signals that tend to sound similar because they all use simple tone generators to beep their warnings. There is no coordination among the instru ments, which means that in major emergencies, they all sound at once. Most can be ignored anyway because they tell the operator about something that is already known. Each competes with the others to be heard, interfering with efforts to address the problem. Unnecessary, annoying alarms occur in numerous situations. How do people cope? By disconnecting warning signals, taping over warning lights (or removing the bulbs), silencing bells, and basically getting rid of all the safety warnings. The problem comes after such alarms are disabled, either when people forget to restore the warning systems (there are those memory-lapse slips again), or if a different incident happens while the alarms are disconnected. At that point, nobody notices. Warnings and safety methods must be used with care and intelligence, taking into account the tradeoffs for the people who are affected.

The design of warning signals is surprisingly complex. They have to be loud or bright enough to be noticed, but not so loud or bright that they become annoying distractions. The signal has to both attract attention (act as a signifier of critical information) and also deliver information about the nature of the event that is being signified. The various instruments need to have a coordinated re sponse, which means that there must be international standards and collaboration among the many design teams from different, often competing, companies. Although considerable research has been directed toward this problem, including the development of national standards for alarm management systems, the problem still remains in many situations.

More and more of our machines present information through speech. But like all approaches, this has both strengths and weaknesses. It allows for precise information to be conveyed, es pecially when the person’s visual attention is directed elsewhere. But if several speech warnings operate at the same time, or if the environment is noisy, speech warnings may not be understood. Or if conversations among the users or operators are necessary, speech warnings will interfere. Speech warning signals can be effective, but only if used intelligently.

DESIGN LESSONS FROM THE STUDY OF ERRORS

Several design lessons can be drawn from the study of errors, one for preventing errors before they occur and one for detecting and correcting them when they do occur. In general, the solutions fol low directly from the preceding analyses.

A DDI NG C ONS T R A I N T S T O BL O C K E R ROR S

Prevention often involves adding specific constraints to actions. In the physical world, this can be done through clever use of shape and size. For example, in automobiles, a variety of fluids are re quired for safe operation and maintenance: engine oil, transmis sion oil, brake fluid, windshield washer solution, radiator coolant, battery water, and gasoline. Putting the wrong fluid into a reser voir could lead to serious damage or even an accident. Automobile manufacturers try to minimize these errors by segregating the fill ing points, thereby reducing description-similarity errors. When the filling points for fluids that should be added only occasion ally or by qualified mechanics are located separately from those for fluids used more frequently, the average motorist is unlikely to use the incorrect filling points. Errors in adding fluids to the wrong container can be minimized by making the openings have different sizes and shapes, providing physical constraints against inappropriate filling. Different fluids often have different colors so that they can be distinguished. All these are excellent ways to min imize errors. Similar techniques are in widespread use in hospitals and industry. All of these are intelligent applications of constraints, forcing functions, and poka-yoke.

Electronic systems have a wide range of methods that could be used to reduce error. One is to segregate controls, so that easily confused controls are located far from one another. Another is to use separate modules, so that any control not directly relevant to the current operation is not visible on the screen, but requires extra effort to get to.

U N D O

Perhaps the most powerful tool to minimize the impact of errors is the Undo command in modern electronic systems, reversing the operations performed by the previous command, wherever pos sible. The best systems have multiple levels of undoing, so it is possible to undo an entire sequence of actions. Obviously, undoing is not always possible. Sometimes, it is only effective if done immediately after the action. Still, it is a power ful tool to minimize the impact of error. It is still amazing to me that many electronic and computer-based systems fail to provide a means to undo even where it is clearly possible and desirable.

C ON F I R M AT ION A N D E R ROR M E S SAGE S

Many systems try to prevent errors by requiring confirmation be fore a command will be executed, especially when the action will destroy something of importance. But these requests are usually ill-timed because after requesting an operation, people are usu ally certain they want it done. Hence the standard joke about such warnings:

Person: Delete “my most important file.” System: Do you want to delete “my most important file”? Person: Yes. System: Are you certain? Person: Yes! System “My most favorite file” has been deleted. Person: Oh. Damn.

The request for confirmation seems like an irritant rather than an essential safety check because the person tends to focus upon the action rather than the object that is being acted upon. A bet ter check would be a prominent display of both the action to be taken and the object, perhaps with the choice of “cancel” or “do it.” The important point is making salient what the implications of the action are. Of course, it is because of errors of this sort that the Undo command is so important. With traditional graphical user interfaces on computers, not only is Undo a standard command, but when files are “deleted,” they are actually simply moved from sight and stored in the file folder named “Trash,” so that in the above example, the person could open the Trash and retrieve the erroneously deleted file.

Confirmations have different implications for slips and mistakes. When I am writing, I use two very large displays and a powerful computer. I might have seven to ten applications running simul taneously. I have sometimes had as many as forty open windows. Suppose I activate the command that closes one of the windows, which triggers a confirmatory message: did I wish to close the win dow? How I deal with this depends upon why I requested that the window be closed. If it was a slip, the confirmation required will be useful. If it was by mistake, I am apt to ignore it. Consider these two examples:

A slip leads me to close the wrong window.

Suppose I intended to type the word We, but instead of typing Shift + W for the first character, I typed Command + W (or Con trol + W), the keyboard command for closing a window. Because I expected the screen to display an uppercase W, when a dialog box appeared, asking whether I really wanted to delete the file, I would be surprised, which would immediately alert me to the slip. I would cancel the action (an alternative thoughtfully provided by the dialog box) and retype the Shift + W, carefully this time.

A mistake leads me to close the wrong window.

Now suppose I really intended to close a window. I often use a temporary file in a window to keep notes about the chapter I am working on. When I am finished with it, I close it without saving its contents—after all, I am finished. But because I usually have multi ple windows open, it is very easy to close the wrong one. The com puter assumes that all commands apply to the active window—the one where the last actions had been performed (and which contains the text cursor). But if I reviewed the temporary window prior to closing it, my visual attention is focused upon that window, and when I decide to close it, I forget that it is not the active window from the computer’s point of view. So I issue the command to shut the window, the computer presents me with a dialog box, asking for confirmation, and I accept it, choosing the option not to save my work. Because the dialog box was expected, I didn’t bother to read it. As a result, I closed the wrong window and worse, did not save any of the typing, possibly losing considerable work. Warning messages are surprisingly ineffective against mistakes (even nice requests, such as the one shown in Chapter 4, Figure 4.6, page 143). Was this a mistake or a slip? Both. Issuing the “close” command while the wrong window was active is a memory-lapse slip. But deciding not to read the dialog box and accepting it without saving the contents is a mistake (two mistakes, actually).

What can a designer do? Several things:

SENSIBILITY CHECKS

Electronic systems have another advantage over mechanical ones: they can check to make sure that the requested operation is sensible.

It is amazing that in today’s world, medical personnel can ac cidentally request a radiation dose a thousand times larger than normal and have the equipment meekly comply. In some cases, it isn’t even possible for the operator to notice the error.

Similarly, errors in stating monetary sums can lead to disastrous results, even though a quick glance at the amount would indicate that something was badly off. For example, there are roughly 1,000 Korean won to the US dollar. Suppose I wanted to transfer $1,000 into a Korean bank account in won ($1,000 is roughly 1,000,000). But suppose I enter the Korean number into the dollar field. Oops—I’m trying to transfer a million dollars. Intelligent systems would take note of the normal size of my transactions, query ing if the amount was considerably larger than normal. For me, it would query the million-dollar request. Less intelligent systems would blindly follow instructions, even though I did not have a million dollars in my account (in fact, I would probably be charged a fee for overdrawing my account).

Sensibility checks, of course, are also the answer to the serious errors caused when inappropriate values are entered into hospital medication and X-ray systems or in financial transactions, as dis cussed earlier in this chapter.

MINIMIZING SLIPS

Slips most frequently occur when the conscious mind is distracted, either by some other event or simply because the action being per formed is so well learned that it can be done automatically, without conscious attention. As a result, the person does not pay sufficient attention to the action or its consequences. It might therefore seem that one way to minimize slips is to ensure that people always pay close, conscious attention to the acts being done. Bad idea. Skilled behavior is subconscious, which means it is fast, effortless, and usually accurate. Because it is so automatic, we can type at high speeds even while the conscious mind is occupied composing the words. This is why we can walk and talk while nav igating traffic and obstacles. If we had to pay conscious attention to every little thing we did, we would accomplish far less in our lives. The information processing structures of the brain automat ically regulate how much conscious attention is being paid to a task: conversations automatically pause when crossing the street amid busy traffic. Don’t count on it, though: if too much attention is focused on something else, the fact that the traffic is getting dan gerous might not be noted.

Many slips can be minimized by ensuring that the actions and their controls are as dissimilar as possible, or at least, as physically far apart as possible. Mode errors can be eliminated by the simple expedient of eliminating most modes and, if this is not possible, by making the modes very visible and distinct from one another.

The best way of mitigating slips is to provide perceptible feed back about the nature of the action being performed, then very perceptible feedback describing the new resulting state, coupled with a mechanism that allows the error to be undone. For example, the use of machine-readable codes has led to a dramatic reduction in the delivery of wrong medications to patients. Prescriptions sent to the pharmacy are given electronic codes, so the pharmacist can scan both the prescription and the resulting medication to ensure they are the same. Then, the nursing staff at the hospital scans both the label of the medication and the tag worn around the patient’s wrist to ensure that the medication is being given to the correct individual. Moreover, the computer system can flag repeated ad ministration of the same medication. These scans do increase the workload, but only slightly. Other kinds of errors are still possible, but these simple steps have already been proven worthwhile.

Common engineering and design practices seem as if they are deliberately intended to cause slips. Rows of identical controls or meters is a sure recipe for description-similarity errors. Internal modes that are not very conspicuously marked are a clear driver of mode errors. Situations with numerous interruptions, yet where the design assumes undivided attention, are a clear enabler of memory lapses—and almost no equipment today is designed to support the numerous interruptions that so many situations en tail. And failure to provide assistance and visible reminders for performing infrequent procedures that are similar to much more frequent ones leads to capture errors, where the more frequent ac tions are performed rather than the correct ones for the situation. Procedures should be designed so that the initial steps are as dis similar as possible.

The important message is that good design can prevent slips and mistakes. Design can save lives.

THE SWISS CHEESE MODEL OF HOW ERRORS LEAD TO ACCIDENTS

Fortunately, most errors do not lead to accidents. Accidents often have numerous contributing causes, no single one of which is the root cause of the incident.

James Reason likes to explain this by invoking the metaphor of multiple slices of Swiss cheese, the cheese famous for being riddled with holes (Figure 5.3). If each slice of cheese represents a condi tion in the task being done, an accident can happen only if holes in all four slices of cheese are lined up just right. In well-designed systems, there can be many equipment failures, many errors, but they will not lead to an accident unless they all line up precisely. Any leakage—passageway through a hole—is most likely blocked at the next level. Well-designed systems are resilient against failure.

This is why the attempt to find “the” cause of an accident is usually doomed to fail. Accident investiga tors, the press, government officials, and the everyday citizen like to find simple explanations for the cause of an accident. “See, if the hole in slice A

Reason’s Swiss Cheese Model of Accidents. Accidents usually have multiple causes, whereby had any single one of those causes not happened, the acci dent would not have occurred. The British accident researcher James Reason describes this through the metaphor of slices of Swiss cheese: unless the holes all line up per fectly, there will be no accident. This metaphor provides two lessons: First, do not try to find “the” cause of an accident; Second, we can decrease accidents and make sys tems more resilient by designing them to have extra precautions against error (more slices of cheese), less opportunities for slips, mistakes, or equipment failure (less holes), and very different mechanisms in the different subparts of the system (trying to en sure that the holes do not line up). (Drawing based upon one by Reason, 1990.)

FIGURE . 5.3

had been slightly higher, we would not have had the accident. So throw away slice A and replace it.” Of course, the same can be said for slices B, C, and D (and in real accidents, the number of cheese slices would sometimes measure in the tens or hundreds). It is rel atively easy to find some action or decision that, had it been dif ferent, would have prevented the accident. But that does not mean that this was the cause of the accident. It is only one of the many causes: all the items have to line up.

You can see this in most accidents by the “if only” statements. “If only I hadn’t decided to take a shortcut, I wouldn’t have had the accident.” “If only it hadn’t been raining, my brakes would have worked.” “If only I had looked to the left, I would have seen the car sooner.” Yes, all those statements are true, but none of them is “the” cause of the accident. Usually, there is no single cause. Yes, journalists and lawyers, as well as the public, like to know the cause so someone can be blamed and punished. But reputable investigating agencies know that there is not a single cause, which is why their investigations take so long. Their responsibility is to understand the system and make changes that would reduce the chance of the same sequence of events leading to a future accident. The Swiss cheese metaphor suggests several ways to reduce accidents:

Each of these has operational implications. More slices of cheese means mores lines of defense, such as the requirement in aviation and other industries for checklists, where one person reads the items, another does the operation, and the first person checks the opera tion to confirm it was done appropriately.

Reducing the number of critical safety points where error can occur is like reducing the number or size of the holes in the Swiss cheese. Properly designed equipment will reduce the opportunity for slips and mistakes, which is like reducing the number of holes and making the ones that remain smaller. This is precisely how the safety level of commercial aviation has been dramatically improved. Deborah Hersman, chair of the National Transportation Safety Board, described the design philosophy as:

U.S. airlines carry about two million people through the skies safely every day, which has been achieved in large part through design redun dancy and layers of defense.

Design redundancy and layers of defense: that’s Swiss cheese. The metaphor illustrates the futility of trying to find the one un derlying cause of an accident (usually some person) and punishing the culprit. Instead, we need to think about systems, about all the interacting factors that lead to human error and then to accidents, and devise ways to make the systems, as a whole, more reliable.

When Good Design Isn’t Enough

WHEN PEOPLE REALLY ARE AT FAULT

I am sometimes asked whether it is really right to say that people are never at fault, that it is always bad design. That’s a sensible question. And yes, of course, sometimes it is the person who is at fault.

Even competent people can lose competency if sleep deprived, fa tigued, or under the influence of drugs. This is why we have laws banning pilots from flying if they have been drinking within some specified period and why we limit the number of hours they can fly without rest. Most professions that involve the risk of death or injury have similar regulations about drinking, sleep, and drugs. But everyday jobs do not have these restrictions. Hospitals often re quire their staff to go without sleep for durations that far exceed the safety requirements of airlines. Why? Would you be happy having a sleep-deprived physician operating on you? Why is sleep depriva tion considered dangerous in one situation and ignored in another?

Some activities have height, age, or strength requirements. Others require considerable skills or technical knowledge: people not trained or not competent should not be doing them. That is why many activities require government-approved training and li censing. Some examples are automobile driving, airplane piloting, and medical practice. All require instructional courses and tests. In aviation, it isn’t sufficient to be trained: pilots must also keep in practice by flying some minimum number of hours per month. Drunk driving is still a major cause of automobile accidents: this is clearly the fault of the drinker. Lack of sleep is another major culprit in vehicle accidents. But because people occasionally are at fault does not justify the attitude that assumes they are always at fault. The far greater percentage of accidents is the result of poor design, either of equipment or, as is often the case in industrial accidents, of the procedures to be followed.

As noted in the discussion of deliberate violations earlier in this chapter (page 169), people will sometimes deliberately violate procedures and rules, perhaps because they cannot get their jobs done otherwise, perhaps because they believe there are extenu ating circumstances, and sometimes because they are taking the gamble that the relatively low probability of failure does not apply to them. Unfortunately, if someone does a dangerous activity that only results in injury or death one time in a million, that can lead to hundreds of deaths annually across the world, with its 7 billion people. One of my favorite examples in aviation is of a pilot who, after experiencing low oil-pressure readings in all three of his en gines, stated that it must be an instrument failure because it was a one-in-a-million chance that the readings were true. He was right in his assessment, but unfortunately, he was the one. In the United States alone there were roughly 9 million flights in 2012. So, a one in-a-million chance could translate into nine incidents.

Sometimes, people really are at fault.

Resilience Engineering

In industrial applications, accidents in large, complex systems such as oil wells, oil refineries, chemical processing plants, electri cal power systems, transportation, and medical services can have major impacts on the company and the surrounding community.

Sometimes the problems do not arise in the organization but out side it, such as when fierce storms, earthquakes, or tidal waves demolish large parts of the existing infrastructure. In either case, the question is how to design and manage these systems so that they can restore services with a minimum of disruption and dam age. An important approach is resilience engineering, with the goal of designing systems, procedures, management, and the training of people so they are able to respond to problems as they arise. It strives to ensure that the design of all these things—the equipment, procedures, and communication both among workers and also ex ternally to management and the public—are continually being as sessed, tested, and improved.

Thus, major computer providers can deliberately cause errors in their systems to test how well the company can respond. This is done by deliberately shutting down critical facilities to ensure that the backup systems and redundancies actually work. Although it might seem dangerous to do this while the systems are online, serving real customers, the only way to test these large, complex systems is by do ing so. Small tests and simulations do not carry the complexity, stress levels, and unexpected events that characterize real system failures.

As Erik Hollnagel, David Woods, and Nancy Leveson, the au thors of an early influential series of books on the topic, have skill fully summarized:

Resilience engineering is a paradigm for safety management that fo cuses on how to help people cope with complexity under pressure to achieve success. It strongly contrasts with what is typical today—a paradigm of tabulating error as if it were a thing, followed by interven tions to reduce this count. A resilient organisation treats safety as a core value, not a commodity that can be counted. Indeed, safety shows itself only by the events that do not happen! Rather than view past success as a reason to ramp down investments, such organisations continue to invest in anticipating the changing potential for failure because they appreciate that their knowledge of the gaps is imperfect and that their environment constantly changes. One measure of resilience is therefore the ability to create foresight—to anticipate the changing shape of risk,

before failure and harm occurs. (Reprinted by permission of the publishers. Hollnagel, Woods, & Leveson, 2006, p. 6.)

The Paradox of Automation

Machines are getting smarter. More and more tasks are becoming fully automated. As this happens, there is a tendency to believe that many of the difficulties involved with human control will go away. Across the world, automobile accidents kill and injure tens of millions of people every year. When we finally have widespread adoption of self-driving cars, the accident and casualty rate will probably be dramatically reduced, just as automation in factories and aviation have increased efficiency while lowering both error and the rate of injury.

When automation works, it is wonderful, but when it fails, the resulting impact is usually unexpected and, as a result, danger ous. Today, automation and networked electrical generation sys tems have dramatically reduced the amount of time that electrical power is not available to homes and businesses. But when the elec trical power grid goes down, it can affect huge sections of a coun try and take many days to recover. With self-driving cars, I predict that we will have fewer accidents and injuries, but that when there is an accident, it will be huge.

Automation keeps getting more and more capable. Automatic systems can take over tasks that used to be done by people, whether it is maintaining the proper temperature, automatically keeping an automobile within its assigned lane at the correct distance from the car in front, enabling airplanes to fly by them selves from takeoff to landing, or allowing ships to navigate by themselves. When the automation works, the tasks are usually done as well as or better than by people. Moreover, it saves peo ple from the dull, dreary routine tasks, allowing more useful, productive use of time, reducing fatigue and error. But when the task gets too complex, automation tends to give up. This, of course, is precisely when it is needed the most. The paradox is that automation can take over the dull, dreary tasks, but fail with the complex ones.

When automation fails, it often does so without warning. This is a situation I have documented very thoroughly in my other books and many of my papers, as have many other people in the field of safety and automation. When the failure occurs, the human is “out of the loop.” This means that the person has not been paying much attention to the operation, and it takes time for the failure to be noticed and evaluated, and then to decide how to respond. In an airplane, when the automation fails, there is usually con siderable time for the pilots to understand the situation and re spond. Airplanes fly quite high: over 10 km (6 miles) above the earth, so even if the plane were to start falling, the pilots might have several minutes to respond. Moreover, pilots are extremely well trained. When automation fails in an automobile, the person might have only a fraction of a second to avoid an accident. This would be extremely difficult even for the most expert driver, and most drivers are not well trained.

In other circumstances, such as ships, there may be more time to respond, but only if the failure of the automation is noticed. In one dramatic case, the grounding of the cruise ship Royal Majesty in 1997, the failure lasted for several days and was only detected in the postaccident investigation, after the ship had run aground, causing several million dollars in damage. What happened? The ship’s lo cation was normally determined by the Global Positioning System (GPS), but the cable that connected the satellite antenna to the nav igation system somehow had become disconnected (nobody ever discovered how). As a result, the navigation system had switched from using GPS signals to “dead reckoning,” approximating the ship’s location by estimating speed and direction of travel, but the design of the navigation system didn’t make this apparent. As a re sult, as the ship traveled from Bermuda to its destination of Boston, it went too far south and went aground on Cape Cod, a peninsula jutting out of the water south of Boston. The automation had per formed flawlessly for years, which increased people’s trust and re liance upon it, so the normal manual checking of location or careful perusal of the display (to see the tiny letters “dr” indicating “dead reckoning” mode) were not done. This was a huge mode error failure.

Design Principles for Dealing with Error

People are flexible, versatile, and creative. Machines are rigid, pre cise, and relatively fixed in their operations. There is a mismatch between the two, one that can lead to enhanced capability if used properly. Think of an electronic calculator. It doesn’t do mathemat ics like a person, but can solve problems people can’t. Moreover, calculators do not make errors. So the human plus calculator is a perfect collaboration: we humans figure out what the important problems are and how to state them. Then we use calculators to compute the solutions.

Difficulties arise when we do not think of people and machines as collaborative systems, but assign whatever tasks can be auto mated to the machines and leave the rest to people. This ends up requiring people to behave in machine like fashion, in ways that differ from human capabilities. We expect people to monitor ma chines, which means keeping alert for long periods, something we are bad at. We require people to do repeated operations with the extreme precision and accuracy required by machines, again some thing we are not good at. When we divide up the machine and human components of a task in this way, we fail to take advantage of human strengths and capabilities but instead rely upon areas where we are genetically, biologically unsuited. Yet, when people fail, they are blamed.

What we call “human error” is often simply a human action that is inappropriate for the needs of technology. As a result, it flags a deficit in our technology. It should not be thought of as error. We should eliminate the concept of error: instead, we should realize that people can use assistance in translating their goals and plans into the appropriate form for technology.

Given the mismatch between human competencies and tech nological requirements, errors are inevitable. Therefore, the best designs take that fact as given and seek to minimize the opportu nities for errors while also mitigating the consequences. Assume that every possible mishap will happen, so protect against them. Make actions reversible; make errors less costly. Here are key de sign principles:

We should deal with error by embracing it, by seeking to under stand the causes and ensuring they do not happen again. We need to assist rather than punish or scold.

A ‘Pile’ Metaphor for Supporting Casual Organization

Richard Mander,

Gitta Salomon

Human Interface

of Information

and Yin Yin Wong

Group, Advanced Apple Computer,

Technology

Cupertino, California

(408)996-1010

l~CHl ’92

ABSTRACT A user study was conducted to investigate how people deal Subjects with the flow of information

in their workspaces.

reported that, in an attempt to quickly and informally

age their Piles were seen as complementary

man

information, they created piles of documents.

to the folder filing

tem, which was used for more formal archiving. desktop interface element – the pile – was developed and prototype through an iterative process. The design in techniques and support for cludes direct manipulation browsing, and goes beyond physical world functionality by providing system assistance for automatic pile construction and reorganization. Preliminary user tests indicate the de sign is promising and raise issues that will be addressed in future work.

sys A new

KEYWORDS: interface design, design process, interactive

systems, user observation, aphors, pile metaphor, information tion organization,

desktop metaphor,

interface met

visualization, informa

end-user programming.

on their com puters increases, tools to organize and manipulate this in formation become increasingly important.

INTRODUCTION As the amount of information

Today’s direct manipulation the Macintosh@

users confront

computer interfaces, such as

of handling within folders, organized in a rigid hierarchy. Users are re sponsible for appropriately

filing all items; the system of fers little assistance in this often tedious task. Recent en hancements, such as “aliases”

desktop interface [1], offer limited means information. Users can manually place files

[2], allow users to overcome

a frequent

problem, namely that an item belongs in more than one folder.

However, the folder as the sole container

type presents an impoverished

set of possibilities.

sys tems. In the past, researchers have looked at how users find items in their physical offices [9]. We conducted a study to observe how users organize the large amounts of information they work with in their physical offices. Our study differed from previous work in that we looked at ways in which people use and interact with filing systems.

The real world provides a rich array of organization

Permission to copy without fee all or part of this material is granted for direct com provided that the copies are not made or distributed mercial advantage, the ACM copyright notice and the title of the publica tion and its date appear, and notice is given that copying is by permis

sion of the Association

or to republish,

for Computing

Machinery. To copy otherwise,

requires a fee and/or specific permission.

1.50

95014

We were also interested in how people work with assistants when dealing with information.

By examining schemes, we were able to extract and extrapolate

individuals’ information management

a number

interface ideas for a graphical

interface. Our

have argued against this proce dure [3,6] – but rather to leverage users’ knowledge to

of interesting intent was not simply to emulate physical world functional ity – several investigators

create an intuitive

and powerful

physical world capabilities.

to construct

a design

system that goes beyond Using this approach. we sought

which provides new functionality and

enhances the user interface.

Like Malone [9], we found that users like to group items spatially and often prefer to deal with information

by creat ing physical piles of paper, rather than immediately catego rizing it into specific folders. Computer users are confront

ed with

large amounts of information,

but currently are

only provided with a hierarchical filing system for manag ing it.

Therefore, we propose that incorporating

‘piles’ within a

graphical user interface could provide a number of interest

ing possibilities. file a new item; piling requires less mental effort. Today’s office assistants use piles as a way of suggesting categories

Users have difficulty

deciding where to

to others; computerized the same way to convey a certain degree of imprecision in the suggested organization. Piles may also provide an ap propriate representation for the results of information re trieval algorithms which are inherently inexact [13].

agents might

At least one system, BUSINESS

make use of them in

[11], previously

explored

this interface metaphor as a construct within a text-based application programming

language. For example, a user could initiate an action by typing an instruction such as “Empty the In Box onto the Work Pile.” However, there was no graphical representation, and so the system could do little more than allow the user to issue programmatic com mands using a subset of English.

This paper provides both specific design ideas and insight into our design process as it progressed from user inter views to design to testing. The first section describes find ings from observing and interviewing office workers. In particular, we report why folders were not always appropri ate and how and when users found piles useful. We then de scribe the interface designs inspired by these observations. In the third section, we report results from informal tests of

these designs. future work.

In conclusion,

we describe directions for

USER INTERVIEWS As part of our design process, we undertook a user study to find out how people deal with information

in their physical

workspaces. Studies of this kind are important in helping us understand the user’s perspective. Our aim was to identify aspects of the real world work process which could offer in sight into a new, more powerful

interface.

interviews with thirteen men and wom

en in Marketing, departments within Apple Computer, Inc. The interviews lasted between 30 and 60 minutes and were conducted in the participant’s

Support, Human Resources, and Technical

work area. All interviews were videotaped.

We asked people to describe the way information

arrived in

their work area, what they initially tion, where it went next, and how it was finally stored. Par ticipants gave us a tour of their workspace to help us under stand what purpose various cabinets, shelves, and storage devices served. We also took a small pile of documents with us and asked participants

did with this informa

to judge

what these docu

ments were on the basis of their appearance. In this way, we could find out how they worked with completely unfa

miliar information.

Since we are interested in developing

ways for the computer to help the user, we also asked peo ple how they worked with assistants.

Findings Our subjects used a variety of techniques – folders, file cab inets, file racks, piles, binders, card files, and bulletin boards – for managing

the information

in their offices.

Since our primary concern in this paper is the uses of fold ers and piles, we ’11focus on observations

items.

relevant to these

File folders were used in several ways. As could be expected, items were placed in folders which were in turn placed in file cabinets as a means of archiving

Uses for folders.

infor

mation not currently needed. Users applied a variety of or ganizations to their file cabinets, ranging from totally ran dom arrangements

to strict alphabetical and color coded systems.

Users were sometimes dissatisfied with using folders in this way, because they were required to make an explicit deci items. This was of sion about how to categorize individual

One user said

difficult with new information.

ten especially “I’m not always as good at categorizing

things as I would

like...is’s hard to get it right and I’m sort of a perfectionist, so I think that I should know exactly how I should do it...1 like things in their place, but I can’t figure out exactly what place.”

One solution, the information

identified by several users, would be to file in several places. However, even though copiers were near at hand, people did not choose to dupli cate information

in order to store it in more than one folder.

Folders were also used in informal

ways. Many people

mounted folders in racks, which enabled the folders to stand up. These folders were used for frequently accessed information – most often action items and items requiring

regular maintenance, read. Some users ordered or changed the orientation folders in their racks to make the most important or urgent information prominent. Folders were also used as a storage medium within piles, as a way to hold together a certain group of items. As one user commented, folderize to

such as expense reports and things to

of the

keep things neat... there’s no hierarchy building a hierarchy takes too much time.”

Based on these observations, the current folder-based

“... [1] in there, because

we inferred

two things about

interface offered on the Macintosh:

the categorization the use of multiply formal grouping techniques, such as racks, could be useful.

problems are presumably amplified by nested folders, and support for more in

Piles: A less rizid categorization ing folders,

system. In addition to us

users grouped

items into piles. For example,

most workers kept information working area. A common strategy was to create separate piles for each project and place them within the working area, at distances that reflected their urgency. Many work ers also created piles for incoming information that they The contents of users’ could not deal with immediately. piles was clearly not restricted to paper documents – we ob served piles composed of various items such as books, fold ers, reports, binders, cassette tapes, video tapes, postcards, envelopes, magazines, journals, and boxes.

they needed in a specific

People used piles instead of hierarchical they did not require detailed categorization be more easily reordered than a folder and file system. For many workers, the pile was viewed as an entity that was subject to change. Users reported that over a period of time, would often be reshuffled and broken items within

a pile

folders because and they could

down into several sub-piles, and an informal process of cat egorization would begin. We noted several approaches to separating material within piles: some users stacked materi als at different

angles, while some placed dividers within

the pile.

To the outside observer, an office containing

piles often ap

pears disorganized. several piles in their workspace and in most cases, they knew what was in each pile and could tell us quite a lot Seemingly disordered piles were often about its history. sensible to the person who created them, because they de veloped through many interactions over a long period of time. For instance, many piles grew as newer items were added to the top, and workers could tell where things were by their date, since the stack was ordered chronologically.

However, all of our participants had

Piles: self-revealirw. that the outer appearance of their piles conveniently al lowed them to recognize particular items. Our subjects were also able to make use of the appearance of previously unseen piles. We asked them to look at a small pile of unfa miliar materials which we took with us to the interview. By looking at the pile’s outside form, they were able to infer quite a lot about its contents.

browseable. Several users remarked

Consequently, we noted that piles facilitate browsing, and we observed four different browsing methods. In the edge browse method described above, people looked at the out side edges of the pile for clues about the items within. In formation such as color, texture, and thickness was com monly used to judge the contents of a pile. In the restack

method, people started at the top of the pile and dealt with each item in turn by lifting

it somewhere

it off the pile, looking at it and other than back on the pile. In

then placing the hinge method, the items stayed in the pile, but the pile was hinged open at different points to display a single item. The final method was to spread out a pile and look at its contents in parallel.

Assistance with information

management. Most partici

pants did not have an assistant, but said they would wel come one. We asked those who did have assistants to describe how they worked together.

Assistants commonly sorting mail into different categories. For people who had to deal with large amounts of information, their assistant acted as a filter,

took care of routine tasks, such as

ing junk space and create a filing

passing along urgent material and remov mail. Some assistants would reorganize the work

could

be more easily organized. This usually happened in collab oration with the worker, Typically the assistant would sug and discuss these with the worker before gest categories actually filing the material. The assistant would often not understand the technical content, but could scan through looking for keywords that might help in the the materials categorization task. Piles were often used by assistants to indicate potential categories. As one assistant remarked, “I’ll go into his office and put [labels] on piles on his floor and he’ll

good’.”

look at it and say ‘no’ or he’ll say ‘that’s pretty

FROM OBSERVATION The next step in our process was to take our observations of design sketches using Macro and develop

a number

TO DESIGN SKETCH

Mind’s DirectorTM application [8] which supports scripted interaction and animation. These design sketches illustrated particular interaction techniques and were used to facilitate group discussion about interaction possibilities and the technology necessary to support them. They centered

around the development the pile – which would support informal on the computer

of a new organizational

element –

groupings of items

desktop. In addition, we extended the met aphor to include functionality

which could only be provid

ed by the existence of a computer. created are described below.

The design sketches

User-created r)iles. One objective was to allow users to create piles of mixed content and multiple data types. Each item within

a pile would

be represented

picting its first page and extent

to maintain

the informal

by a miniature (see Figure

quality

of physical

Figure . 2

e ttoap

document. When the mo!s~ button

de

la). We wanted piles by provid

ile. If a document

(a)

(b)

Figure . 1

piles can contain various

media, such as folders and individual documents. The pile in (a) was created

by the user, and is consequently

disheveled in appear

ance. In addition, the system can create piles for the user, based on rules explicitly stated by the user or developed through user-system collaboration. These piles have a neat appearance, as shown in (b), to indicate that there is a script, or set of rules, behind them.

techniques which resemble real

a pile

lapping two items; items are added to an existing pile by simply placing them on top (Figure 2). These user-created piles have a disheveled look.

System-created piles. In addition, we postulated that the system could create piles for a user. As shown in Figure 1 (b), these piles would have an orderly appearance. The system would assemble these piles using a script either de veloped through user-system collaboration,

or explicitly

written by the user. By creating and maintaining the user, the system could serve as an office assistant.

piles for

How might this user-system collaboration ly, the user could supply sample documents as input for pile construction.

work? Potential

By analyzing

these documents,

the sys

tem could offer various criteria for script construction. For example, the system could determine a document’s unique terms and let the user select the specific terms to use as piling criteria. Additionally,

tural data, such as the “Re:”

the system might extract struc line in a mail message and ask

the user if similar mail messages should be collected into the pile. Malone suggested a similar tact for automatic clas sification [9] and reported successful results in the Informa tion Lens system [10]. As shown in Figure 3, our design provides a way for users to gradually

learn to create scripts. As in the work of MacLean et al [7], we wanted to provide way for users to approach “tailorability” of piles a natural

as a part

SuVVort

browsing

is positioned

of the system.

for browsing.

We wanted to support some of the

techniques users applied in their physical offices.

is released the document ‘drops’ onto the pile.

to show that it can accept the new

(a)

Figure . 3

(a) depicts a mail area containing two scripted piles; one for important items, one for everything else. Over time, the

mav change.

As shown, an item in the ‘other’ Dile has been removed because the user desires that it, and items like it,

now appear in the important pil& When this item is dropped onto the important pile, as shown in (b), the system queries the user to find out whether this action is a singular event or whether the pile’s script should be modified. If the user chooses to modify the script, the system sug gests criteria which could be used, as shown in (c). Alternatively, users can gain direct access to the scripting language and write their own criteria via the “Script...” button. Once the

script is updated,

items satisfying the new criteria visibly move

to the ‘important’ pile.

(a)

Figure . 4

(a)

(b)

Gesturing sideways with the mouse pointer, or with a finger items can now be directly manipulated.

in the case of a touch screen, causes

(c)

Figure . 5

Gesturing vertically with the mouse pointer as shown in (a), or with a finger in the

(a)

In (a) the pile is both ordered and colored by date. In (b) the user chose to ‘pile by’ content. Therefore, the system separated the original pile into four con tent-based piles. Three are labeled with specific terms suggested by the system (e.g. “architecture”), appear neat and are now scripted to

(t)) environment’ that allaws the user to select and visualize several

Figure . 6

maintain similar content. The remaining disheveled pile, “other,” contains

By virtue of using miniatures of the actual documents, we offered edge browsing capabilities.

In addition,

we ex

plored gestural inputs as a way to invoke other browsing methods. For example, a horizontal

out a pile so that miniatures visible (Figure 4). A vertical up-and-down a pile would allow users to browse a pile using a ‘viewing cone’ (Figure 5). When an item was visible in the viewing cone, the user could move through miniature representa tions of its pages by using the cursor keys on the keyboard. In the design sketches, a mouse was used to create the ges tures, but we thought that these interaction

would be particularly

gesture would spread of each item’s first page were

movement over

techniques

well-suited to a touch screen display.

Manapitw uiles. In physical offices, the user is confronted with many pile management tasks, such as re-piling

sub-piling when aparticular ciflc information

and

pile becomes unwieldy orspe

must be retrieved.

to act as a collaborator

We wanted the system in dealing with these issues, and

therefore designed a ‘visualizing would help users understand the contents of piles. As shown in Figure 6, a user might choose to emphasize cer tain criteria in apileby using order, color or sub-piles. A of criteria user can elect to view combinations

environment’ which

simultane

color coded according totheir data type. Additionally, the user might have the system suggest sub ject-based sub-piles, by choosing the “pile by content” op tion. The sub-piles deemed useful could be moved out of area for use on the desktop. the visualization

ously. For example, the user could choose that items in a pile be ordered by date, The user might also request that the items in apilebe

TESTING USER’S EXPECTATIONS OF PILES The design sketches raised interest amongst our colleagues and were the focal point for discussions. However, since we did not know if the in the sketches were ‘hard-wired,’ teraction techniques were usable and of value to end-users. Consequently, we undertook a user test of the interaction techniques,

We constructed

a suite of prototypes

ported the interactivity gauge people’s expectations about the inclusion of piles on the desktop. Our method was informal, resembling the type of testing described in [4,12], in order to provide us with quick results that could be used in design iteration.

that sup we wished to test. We hoped to

in Director

items which did not fit into any of the other three piles.

Method Five men and five women in nontechnical Apple Computer were individually

positions at

tested in approximately

Pilitw models. Two different models for a pile were com pared: a “document-centered”

model and a “pile-centered”

model. Possible ordering effects were avoided by varying

the presentation

of these two models across users.

In the “document-centered”

as a collection

of individual

task, the pile was represented items. The user was presented

with a series of colored rectangles within a white screen area. These rectangles were intended to represent files on a desktop. The rectangles could be selected and moved with the mouse. When one rectangle was placed over another rectangle, both would fall back to create a disheveled pile. Additional documents could be added to an existing pile by moving them over the pile and releasing the mouse. Docu ments could be removed by individually selecting them via region and dragging them away from the pile. any visible The pile itself could not be moved as a unit.

In the “pile-centered” way, except that the pile acted more like a Macintosh folder of documents. — a single

task, piles were created in the same

entity containing

a collection

When one document ‘rectangle’ was moved over another rectangle, the latter would highlight to indicate a pile would if the mouse button was released. Subsequent be formed documents moved to the pile would automatically drop

onto the top of the pile. The pile itself, as opposed to inde

pendent documents, could then be dragged around the desk

top by mouse-clicking

on any part of it.

Initiating browsing. In this task, participants ferent ways of initiating

tried out dif

pile browsing.

They compared

double-clicking and the horizontal gesture (shown in Figure

(a) stationary

(b) side shift

Figure . 7

cone browsing. mains stationary.

(c) document

pull-out

Users were presented with three different ways a pile could react during viewing

In each case, the viewing cone contains a miniature of the first page of the document being examined,

In (b), each item above the item being currently viewed is moved to the left side.

porarily moves out of the pile and to the right. Users preferred methods (b) and (c).

4) as ways to spread out a pile’s contents. They also com gesturing (Figure 5) for pared double-clicking

and vertical

initiating browsing with the view cone.

In this task, participants were presented with three different visual representations of the-viewing

Viewitw cone remesentations.

same interaction

cone (see Figure 7). All were initiated by the – a single mouse click on the pile – but the

order of presentation settled on a preferable

was varied for each user. When users

representation, they were shown

how to use the keyboard to examine a miniature ment’s pages while it was within the viewing cone. Users were shown which key would move the document forward one page, and which would move it backwards.

docu

as a stimulus. Then they were asked to locate three separate items within the pile: a colored bar graph, a map of North America which was contained in an Atlas document, and a document containing bullet point text. Users were not timed – this part of the test was aimed at determining

Findirw items within a pile. In this task, participants were asked to use the viewing cone and paging ability to locate specific pages within documents in the pile. First they were asked to locate a picture of a hand on a mouse, for which they were shown a real report illustration

was, in general, qualitatively information.

if pile browsing suitable to users for locating

Results Pilinz models. Although each user had a clear preference for one of our methods of pile creation (“pile-centered”

or “document-centered”), neither method was judged to be clearly superior. In the “document-centered” model, users liked the ability to grab an individual document within a pile. A problem with this model was that users were not sure how to move a pile as a unit, since selecting

of the pile led to moving an individual pile as a whole. In the “pile-centered” the way the system automatically

any part

item rather than the model, users liked

aligned the items in the

pile, the ability to move a pile as a unit, and the highlight ing that indicated a pile was ready to accept an item. A problem with this model was the difficulty

individual item within the pile,

of selecting

an

was placed on top of an isolated folder; users were unsure whether the item would go into the fold er or if a new pile would be created. Most users thought

Most users also expected that any desktop item could be added to a pile. This led to discussion of what would hap pen if a document

In style (a), the pile re In (c), the item currently being viewed tem

that, based on their previous Macintosh experience, the item would go into the folder. This raises questions about how the pile metaphor fits into the current Macintosh desk top metaphor.

Most users asked for features generally available in desk~op systems, but which were not present in the testing proto types. For example, they wanted to be able to add a select ed group of items to a pile, name piles, apply ordering schemes based on date, size, name, and kind, and control where a document

was placed within

a pile.

Since users liked and disliked certain features of each mod el, new design work will be undertaken to create models that both embody users’ preferences and are internally con sistent. Further testing of these new models will be conducted.

~. mouse double-clicks

Subjects tried using both gestures and to spread out a pile and also to obtain

the viewing preferred the double-click more Macintosh-like,

cone. In both cases, 9 out of the 10 participants

method. They found it faster and

which was not unexpected given that

the subjects were all accustomed to the Macintosh. Howev er, users also felt that the gestures were non-intuitive and somewhat ambiguous, and that the piles might be spread out accidentally while moving the cursor around on the intended for use on a screen. The gestures were originally

in a test with a touch screen. Note that we did not ask users which of the browsing methods they would want initiated by the double click action; we only ascertained that they prefen-ed dou ble-clicking over gesturing,

touch screen and most participants on the screen for the gesture might be more intuitive than using a mouse. This needs to be confirmed

said that using a finger

In general, ‘spread out’ view. Since all items were visible at once, it A few users ex supported recognition

users thought

they would

and comparison.

make use of the

and that the system provide representations which would specifically help the user differentiate

pressed interest in viewing a grid layout rather than the overlapping one used in our testing prototype. While in this view, most users expected to both be able to act on individ ual items in strmdard ways, and move the documents as a group. In addition to the miniature representation of each item, many users requested that other information such as name, date, and kind be made available,

similar items.

Viewitw cone remesentations. Of the three viewing cone designs, the stationary pile version (Figure 7a) was rejected by all 10 users. All thought it was difficult to gauge where they were within the pile. Four users preferred the ‘side shift’ style (Figure 7b), 5 preferred the ‘document pull-out’ style (Figure 7c), and 1 user was undecided between these latter two designs. Both of the prefetred designs clearly provided a view of an item’s location in the pile, in addition to a representation

shown within

the viewing

though it was not implemented had an item visible in the cone they often tried to grab it by either releasing and then quickly clicking the mouse, or by attempting to drag it from the pile.

in theprototype,

Most users liked the viewing made it possible for them to identify ture representation

cone. Al once users

cone as a browsing

method. It

items by their minia

without disturbing the pile’s state. Users

also liked the ability to view any page of an individual item, although not all were pleased with using the cursor keys on the keyboard to cause this action. Page numbering information (e.g. ‘ 1 of 10’) was found valuable while pag ing through the document, because it indicated the relative within an item. One size of each item, as well as position user desired random access to any page via selection of its number from the keyboard. A few users expressed interest in being able to target the viewing cone at any item on the

desktop – a single document, in piles.

a folder

– and not just items

During the tests we noted a potential problem with the viewing cone implementation

– users might need to depress the mouse button for a long time while browsing, which could lead to repetitive

stress injury. A possible solution is

to invoke the cone whenever the user clicks on a pile, there by alleviating ously depressed. This would also allow the user to click the mouse button again to select an item for removal from the pile while the viewing cone was active.

the need for the mouse button to be continu

~indin~ items within the ~ile. We showed the subjects a physical version of a report and identified

a specific

illus

tration which we wanted them to find within a pile on the computer desktop. Users were asked to use the viewing cone and cursor keys to examine items in the pile. All of the users were able to find the illustration within a reason able amount of time. As mentioned earlier, we were not concerned with timing information, but rather with the fea sibility of the viewing cone for this task.

Since the picture was within a report which was bound in a green-edged cover, several users recognized the document within the pile by its clearly visible green spine. A common strategy was to subsequently

move through the document’s

miniature pages, looking for a small colored image in the top right corner of a page. Only one user took advantage of the page numbers on the miniature

representations to identi

fy the page. A few users did not expect the document on the computer

to have the green spine that was present on the physical report because they perceived it to be an addi tion which the system could not have known about. These users’ strategy was to start at the top of the pile and system atically look through every item, page by page, to find the picture.

We also asked the subjects to find a colored bar graph, a map of the North American continent within an atlas, and

some bullet point text. Only some of the users found the items, and with some difficulty.

Many users felt they would

do better with their own information was due to a lack of familiarity

and their difficulty

with the material in the pile.

Several users discussed ways they would like the system to help them in such a situation.

They commonly

wanted the

ability to search for specific data types, names, keywords, and other identifiers.

From this feedback, we inferred that it might be useful to give the user control over the information presented in the viewing cone. For instance, when searching for a graph, the user could

select data type ‘graph’

as the search criteria,

thereby causing the viewing cone to display only pages containing graph data types. This could be a powerful way to search, since it would enable the user to tailor the view according to current needs.

mail. Most users reported having between two and five mail systems, fax, and voice mail. They liked the idea of receiving information in a pile all incoming which could be accessed with the viewing cone. Within such a pile, they would want the system to prioritize items using characteristics such as sender, topic, content key words, date, and urgency. We anticipate these priorities could be learned by the system over a period of time by watching the user interact with incoming information.

General discussion. At the end of the test, we asked users how they would use piles, and how the system might assist them. In general, users were receptive to the idea of having the system help them with their routine tasks, such as sort ing incoming

FUTURE WORK: FROM DESIGN SKETCH TO IMPLEMENTATION There are many areas in which this work can proceed. A few of our current directions are described below.

Improving Designs and Working with Familiar Data We plan to further explore the appropriate model for a pile — and how to possibly combine users’ expectations about its document-centeredness

vs. pile-centeredness

ing on our previous Director prototypes.

– by iterat

we intend to build prototypes that incorporate items of relevance to the individual being tested. The infor mal tests described above involved fabricated data that was unfamiliar to our subjects. In order to continue refining our designs, we need to construct prototypes that will allow

In addition,

use piles for their own information

subjects to interactively over an extended period of time. An extension to the cur rent Finder interface that would allow users to create and work with piles alongside folders would provide an excel lent opportunity prove more feasible to undertake the next round of iteration by addressing a limited

to further

these designs.

However, it may

domain, such as a mail system.

Browsing by Other Criteria The current design allows users to browse the contents of

piles by viewing While users found this feature useful in the tests, they also

miniature representations of each item.

expressed interest in accessing other representations. We

are currently the system

exploring the types of “browse by...” criteria

might offer. For example,

users might

want to

selectively emphasize some data type during browsing, as in the case of ‘show me all the documents containing movies within this pile.’ When confronted with unfamiliar

data, users might want to browse by textual abstract, since a miniature visual representation might not provide insight into an unknown

item’s content.

Technology to Support Pile Interactions The interface designs described in this paper were primarily inspired by observations with users, and not necessarily by existing technology. At the time of design, we were unsure if information retrieval techniques could adequately support some of the interactions,

such as pile scripting-by-example

or sub-pile tive research effort with the Information

creation. Consequently, we initiated a collabora

Retrieval Team

within Apple Computer’s Advanced Technology Group.

cally create sub-piles. For example, a user could supply the system with a pile of documents, and based on the content of the documents within that pile, the system would suggest and describe suitable sub-piles.

As this work progresses, we plan to adapt our designs to reflect the technology

that can be realized.

ACKNOWLEDGEMENTS We would like to thank Dan Rose and Tim Oren for explor ing information

retrieval systems that will support sub

piling and other pile management operations; Stephanie Houde for creating the Director prototypes used in testing; Penny Bauersfeld

and Leo Degen for their participation

in

early design sessions; and Tom Erickson for input on the user study design and feedback on the various drafts of this paper.

The Apple lishing Company, Inc., Reading, MA, 1987.

Desktop Interface.

Addison-Wesley Pub

  1. Apple Computer, Inc. Inside Macintosh, Volume VI. Addison-Wesley Publishing Company, Inc., Reading,
    1. 1991.
  2. Cole, I. Human aspects of office filing:

for the electronic

Implications

office Proceedings of the Human Factors Society, 26th Annual Meeting, Seattle, Wash ington. 1982,

[4] Gomoll, K. Some Techniques for Observing Users The Art of Human-Computer Inte@ace Design (cd.

Brenda Laurel) Addison-Wesley ny, Inc., Reading, MA.

Publishing Compa

  1. pp. 85-90.
  1. Ericsson, K.A. and Simon, H. A. Protocol analysis. Cambridge, Massachusetts: MIT Press. 1984.
  2. Lansdale, M. The psychology of personal information management. Applied Ergonomics, 55, (1988), pp. 55- 66.
  3. MacLean, A., Carter, K., Lovstrand, L. and Moran, T. User-Tailorable Systems: Pressing the Issues with Buttons. In Proceedings of CHI 1990 (Seattle, Wash-
    1. ACM,

New York,

  1. pp.
  1. MacroMind, Inc. Director~M 2.0. April 1990.
  2. Malone, T. W. How do People Organize Their Desks? Implications for the Design of Office Information Systems. ACM Transactions on Office Information Systems, 1,1, (January 1983), pp. 99-112.
  3. Malone, T. W., Grant, K.R., Turbak, F. A., Brobst, S.A. and Cohen, M.D. Intelligent

Information-Sharing

Systems. Communications of the ACM, 30, 5, (May 1987), Pp. 390-402.

  1. Miller, P., Tetelbaum, S. and Webb, K. BUSINESS - an end-user
  2. Nielsen, J. Usability

Engineering at a Discount.

Interfaces and Knowledge Based Systems. Amsterdam:

Salvendy and M.J. Smith (Eds.), Designing and Us ing Human-Computer

Elsevier. 1989.

[13] van Rijsbergen, Butterworths, London, England. 1983.

C.J. Information

Retrieval. (2nd. Ed.)

Human-Computer

Interaction

An Empirical Research Perspective

I. Scott MacKenzie

3.8 Interaction errors

Most of analyses in this chapter are directed at the physical properties of the human-machine interface, such as degrees of freedom in 2D or 3D or spatial and temporal relationships between input controllers and output displays. Human per formance, although elaborated in Chapter 2, has not entered into discussions here, except through secondary observations that certain interactions are better, worse, awkward, or unintuitive. At the end of the day, however, human performance is what counts. Physical properties, although instructive and essential, are secondary. Put another way, human performance is like food, while physical properties are like plates and bowls. It is good and nutritious food that we strive for.

Empirical research in HCI is largely about nding the physical properties and com binations that improve and enhance human performance. We conclude this chapter on interaction elements with comments on that nagging aspect of human performance that frustrates users: interaction errors. Although the time to complete a task can enhance or hinder by degree, errors only hinder. Absence of errors is, for the most part, invisible. As it turns out, errors—interaction errors—are germane to the HCI experience. The big errors are the easy ones—they get xed. It is the small errors that are interesting.

As the eld of HCI matures, a common view that emerges is that the difcult problems (in desktop computing) are solved, and now researchers should focus on new frontiers: mobility, surface computing, ubiquitous computing, online social networking, gaming, and so on. This view is partially correct. Yes, the emerging themes are exciting and fertile ground for HCI research, and many frustrating UI problems from the old days are gone. But desktop computing is still fraught with problems, lots of them. Let’s examine a few of these. Although the examples below are from desktop computing, there are counterparts in mobile computing. See also student exercise 3-8 at the end of this chapter.

The four examples developed in the following discussion were chosen for a specic reason. There is a progression between them. In severity, they range from serious problems causing loss of information to innocuous problems that most users rarely think about and may not even notice. In frequency, they range from rarely, if ever, occurring any more, to occurring perhaps multiple times every min ute while users engage in computing activities. The big, bad problems are well traveled in the literature, with many excellent sources providing deep analyses on what went wrong and why (e.g., Casey, 1998, 2006; Cooper, 1999; Johnson, 2007;

HCI has come a long way: (a) Today’s UIs consistently use the same, predictable dialog to alert the user to a potential loss of information. (b) Legacy dialog rarely (if ever) seen today.

B. H. Kantowitz and Sorkin, 1983; Norman, 1988). While the big problems get lots of attention, and generally get xed, the little ones tend to linger. We’ll see the effect shortly. Let’s begin with one of the big problems.

Most users have, at some point, lost information while working on their computers. Instead of saving new work, it was mistakenly discarded, overwritten, or lost in some way. Is there any user who has not experienced this? Of course, nearly everyone has a story of losing data in some silly way. Perhaps there was a distraction. Perhaps they just didn’t know what happened. It doesn’t matter. It happened. An example is shown in Figure 3.42. A dialog box pops up and the user responds a little too quickly. Press enter with the “Save changes?” dialog box (Figure 3.42a) and all is well, but the same response with the “Discard changes?” dialog box spells disaster (Figure 3.42b). The information is lost. This scenario, told by Cooper (1999, 14), is a clear and serious UI design %aw. The alert reader will quickly retort, “Yes, but if the ‘Discard changes?’ dialog box defaults to ‘No,’ the information is safe.” But that misses the point. The point is that a user expectation is broken. Broken expectations sooner or later cause errors. Today, systems and applications consistently use the “Save changes?” dialog box in Figure 3.42a. With time and experience, user expectations emerge and con geal. The “Save changes?” dialog box is expected, so we act without hesitating and all is well. But new users have no experiences, no expectations. They will develop them sure enough, but there will be some scars along the way. Fortunately, serious %aws like the “Discard changes?” dialog box are rare in desktop applications today. The following is another error to consider. If prompted to enter a password, and caps_lock mode is in effect, logging on will fail and the password must be reen tered. The user may not know that caps_lock is on. Perhaps a key-stroking error occurred. The password is reentered, slowly and correctly, with the caps_lock mode still in effect. Oops! Commit the same error a third time and further log-on attempts may be blocked. This is not as serious as losing information by pressing enter in response to a renegade dialog box, but still, this is an interaction error. Or is it a design %aw? It is completely unnecessary, it is a nuisance, it slows our inter action, and it is easy to correct. Today, many systems have corrected this problem (Figure 3.43a), while others have not (Figure 3.43b).

The caps_lock error is not so bad. But it’s bad enough that it occasionally receives enough attention to be the beneciary of the few extra lines of code neces sary to pop up a caps_lock alert.

FIGURE 3.43

Entering a password: (a) Many systems alert the user if CAPS_LOCK is on. (b) Others do not.

Let’s examine another small problem. In editing a document, suppose the user wishes move some text to another location in the document. The task is easy. With the pointer positioned at the beginning of the text, the user presses and holds the pri mary mouse button and begins dragging. But the text spans several lines and extends past the viewable region. As the dragging extent approaches the edge of the view able region, the user is venturing into a difcult situation. The interaction is about to change dramatically. (See Figure 3.44.) Within the viewable region, the interaction is position-control—the displacement of the mouse pointer controls the position of the dragging extent. As soon as the mouse pointer moves outside the viewable region, scrolling begins and the interaction becomes velocity-control—the displacement of the mouse pointer now controls the velocity of the dragging extent. User beware!

Once in velocity-control mode, it is anyone’s guess what will happen. This is a design %aw. A quick check of several applications while working on this example revealed dramatically different responses to the transition from position control to velocity control. In one case, scrolling was so fast that the dragging region extended to the end of the document in less time than the user could react (≈200 ms). In another case, the velocity of scrolling was controllable but frustratingly slow. Can you think of a way to improve this interaction? A two-handed approach, perhaps. Any technique that gets the job done and allows the user to develop an expectation of the interaction is an improvement. Perhaps there is some empirical research wait ing in this area.

Whether the velocity-control is too sensitive or too sluggish really doesn’t mat ter. What matters is that the user experience is broken or awkward. Any pretense to the interaction being facile, seamless, or transparent is gone. The user will recover, and no information will be lost, but the interaction has degraded to error recovery. This is a design error or, at the very least, a design-induced error. Let’s move on to a very minor error.

When an application or a dialog box is active, one of the UI components has focus and receives an event from the keyboard if a key is pressed. For buttons,

On the brink of hyper-speed scrolling. As the mouse pointer is dragged toward the edge of the viewable region, the user is precipitously close to losing control over the speed of dragging.

focus is usually indicated with a dashed border (see “Yes” button in Figure 3.42). For input elds, it is usually indicated with a %ashing insertion bar (“|”). “Focus advancement” refers to the progression of focus from one UI component to the next. There is wide-spread inconsistency in current applications in the way UI widgets acquire and lose focus and in the way focus advances from one component to the next. The user is in trouble most of the time. Here is a quick example. When a login dialog box pops up, can you immediately begin to enter your username and password? Sometimes yes, sometimes no. In the latter case, the entry eld does not have focus. The user must click in the eld with the mouse pointer or press tab to advance the focus point to the input eld. Figure 3.43 provides examples. Both are real interfaces. The username eld in (a) appears with focus; the same eld in (b) appears without focus. The point is simply that users don’t know. This is a small problem (or is it an interaction error?), but it is entirely common. Focus uncertainty is everywhere in today’s user interfaces. Here is another, more specic example:

Many online activities, such as reserving an airline ticket or booking a vaca tion, require a user to enter data into a form. The input elds often require very specic information, such as a two-digit month, a seven-digit account number, and so on. When the information is entered, does focus advance automatically or is a user action required? Usually, we just don’t know. So we remain “on guard.” Figure 3.45 gives a real example from a typical login dialog box. The user is rst requested to enter an account number. Account numbers are nine digits long, in three three digit segments. After seeing the dialog box, the user looks at the keyboard and begins entering: 9, 8, 0, and then what? Chances are the user is looking at the key board while entering the numeric account number. Even though the user can enter the entire nine digits at once, interaction is halted after the rst three-digit group because the user doesn’t know if the focus will automatically advance to the next eld. There are no expectations here, because this example of GUI interaction has not evolved and stabilized to a consistent pattern. Data entry elds have not reached the evolutionary status of, for example, dialog boxes for saving versus discarding changes (Figure 3.42a). The user either acts, with an approximately 50 percent likelihood of committing an error, or pauses to attend to the display (Has the focus

advanced to the next !eld?).

Strictly speaking, there is no gulf of evaluation here. Although not shown in the gure, the insertion point is present. After entering 980, the insertion point is either after the 0 in the rst eld, if focus did not advance, or at the beginning of the next eld, if focus advanced. So the system does indeed “provide a physical

Inconsistent focus advancement keeps the user on guard. “What do I do next?”

representation that can be perceived and that is directly interpretable in terms of the intentions and expectations of the person” (Norman, 1988, p. 51). That’s not good enough. The user’s attention is on the keyboard while the physical presenta tion is on the system’s display. The disconnect is small, but nevertheless, a shift in the user’s attention is required.

The absence of expectations keeps the user on guard. The user is often never quite sure what to do or what to expect. The result is a slight increase in the atten tion demanded during interaction, which produces a slight decrease in transparency. Instead of engaging in the task, attention is diverted to the needs of the computer. The user is like a wood carver who sharpens tools rather than creates works of art.

Where the consequences of errors are small, such as an extra button click or a gaze shift, errors tend to linger. For the most part, these errors aren’t on any one’s radar. The programmers who build the applications have bigger problems to focus on, like working on their checklist of new features to add to version 2.0 of the application before an impending deadline. 22 The little errors persist. Often, pro grammers’ discretion rules the day (Cooper, 1999, p. 47). An interaction scenario that makes sense to the programmer is likely to percolate through to the nal prod uct, particularly if it is just a simple thing like focus advancement. Do programmers ever discuss the nuances of focus advancement in building a GUI? Perhaps. But was the discussion framed in terms of the impact on the attention or gaze shifts imposed on the user? Not likely.

Each time a user shifts his or her attention (e.g., from the keyboard to the dis play and back), the cost is two gaze shifts. Each gaze shift, or saccade, takes from 70 to 700 ms (Card et al., 1983, p. 28).23 These little bits of interaction add up. They are the ne-grained details—the microstructures and microstrategies used by, or imposed on, the user. “Microstrategies focus on what designers would regard as the mundane aspects of interface design; the ways in which subtle features of interac tive technology in%uence the ways in which users perform tasks” (W. D. Gray and Boehm-Davis, 2000, p. 322). Designers might view these ne-grained details as a mundane sidebar to the bigger goal, but the reality is different. Details are every thing. User experiences exist as collections of microstrategies. Whether booking a vacation online or just hanging out with friends on a social networking site, big actions are collections of little actions. To the extent possible, user actions form the experience, our experience. It is unfortunate that they often exist simply to serve the needs of the computer or application.

22 The reader who detects a modicum of sarcasm here is referred to Cooper (1999, 47–48 and else where) for a full frontal assault on the insidious nature of feature bloat in software applications. The reference to version 2.0 of a nameless application is in deference to Johnson’s second edition of his successful book where the same tone appears in the title: GUI Bloopers 2.0. For a more sober and academic look at software bloat, feature creep, and the like, see McGrenere and Moore (2000). 23 An eye movement involves both a saccade and xation. A saccade—the actual movement of the eye—is fast, about 30 ms. Fixations takes longer as they involve perceiving the new stimulus and cognitive processing of the stimulus.

Another reason little errors tend to linger is that they are often deemed user errors, not design, programming, or system errors. These errors, like most, are more correctly called design-induced errors (Casey, 2006, p. 12). They occur “when designers of products, systems, or services fail to account for the character istics and capabilities of people and the vagaries of human behavior” (Casey, 1998, p. 11). We should all do a little better.

Figure 3.46 illustrates a tradeoff between the cost of errors and the frequency of errors. There is no solid ground here, so it’s just a sketch. The four errors described above are shown. The claim is that high-cost errors occur with low frequency. They receive a lot of attention and they get dealt with. As systems mature and the big errors get xed, designers shift their efforts to xing less costly errors, like the caps_lock design-induced error, or consistently implementing velocity-controlled scrolling. Over time, more and more systems include reasonable and appropriate implementations of these interactions. Divergence in the implementations diminishes and, taken as a whole, there is an industry-wide coalescing toward the same consistent implementation (e.g., a popup alert for caps_lock). The ground is set for user expectation to take hold.

Of the errors noted in Figure 3.46, discard changes is ancient history (in com puting terms), caps_lock is still a problem but is improving, scrolling frenzy is much more controlled in new applications, and focus uncertainty is, well, a mess. The cost is minor, but the error happens frequently.

In many ways, the little errors are the most interesting, because they slip past designers and programmers. A little self-observation and re%ection goes a long way here. Observe little errors that you encounter. What were you trying to do? Did it work the rst time, just as expected? Small interactions are revealing. What were your hands and eyes doing? Were your interactions quick and natural, or were there unnecessary or awkward steps? Could a slight reworking of the interaction help? Could an attention shift be averted with the judicious use of auditory or tactile feed back? Is there a “ready for input” auditory signal that could sound when an input eld receives focus? Could this reduce the need for an attention shift? Would this improve user performance? Would it improve the user experience? Would users like it, or would it be annoying? The little possibilities add up. Think of them as oppor tunities for empirical research in HCI.

Trade-off between the cost of errors and the frequency of errors.

Human-Computer

Interaction

An Empirical Research Perspective

I. Scott MacKenzie

FIGURE 3.18

Button arrangements for an elevator control panel. (a) Correct. (b) Incorrect.

geographic regions have experienced and learned it differently. What is accepted in one region may differ from what is accepted in another.

If there is a physical contradiction, then the situation is different. Consider ele vators (in buildings, not scrollbars). Early elevators didn’t have buttons to specify oors; they only had up and down buttons. Consider the two arrangements for the button controls shown in Figure 3.18. Clearly the arrangement in (a) is superior. When the up control is pressed, the display (the elevator) moves up. The stimulus (control) and response (display) are compatible beyond doubt. In (b) the position of the controls is reversed. Clearly, there is an incompatibility between the stimu lus and the response. This situation is different from the scroll pane example given in Figure 3.6 because there is no physical analogy to help the user (can you think of one?). If all elevator control panels had the arrangement in Figure 3.18b, would a population stereotype emerge, as with the light switch example? Well, sort of. People would learn the relationship, because they must. But they would make more errors than if the relationship was based on a correct physical mapping. This partic ular point has been the subject of considerable experimental testing, dating back to the 1950s (Fitts and Seeger, 1953). See also Newell (1990, 276–278) and Kantowitz and Sorkin (1983, 323–331). The gist of this work is that people take longer and commit more errors if there is a physical misalignment between displays and con trols, or between controls and the responses they effect.

This work is important to HCI at the very least to highlight the challenges in designing human-computer systems. The physical analogies that human factors engi neers seek out and exploit in designing better systems are few and far between in human-computer interfaces. Sure, there are physical relationships like “mouse right, cursor right,” but considering the diversity of people’s interactions with computers, the tasks with physical analogies are the exception. For example, what is the physical analogy for “!le save”? Human-computer interfaces require a different way of think ing. Users need help—a lot of help. The use of metaphor is often helpful.

3.4 Mental models and metaphor

There is more to learning or adapting than simply experiencing. One of the most common ways to learn and adapt is through physical analogy (Norman, 1988, p. 23) or metaphor (Carroll and Thomas, 1982). Once we latch on to a physical under standing of an interaction based on experience, it all makes sense. We’ve expe rienced it, we know it, it seems natural. With a scroll pane, moving the slider up moves the view up. If the relationship were reversed, moving the slider up would move the content up. We could easily develop a physical sense of slider up → view up or slider up → content up. The up-up in each expression demonstrates the importance of !nding a spatially congruent physical understanding. These two analogies require opposite control-display relationships, but either is !ne and we could work with one just as easily as with the other, provided implementations were consistent across applications and platforms.

Physical analogies and metaphors are examples of the more general concept of mental models, also known as conceptual models. Mental models are common in HCI. The idea is simple enough: “What is the user’s mental model of . . . ?” An association with human experience is required. HCI’s !rst mental model was per haps that of the of!ce or desktop. The desktop metaphor helped users understand the graphical user interface. Today it is hard to imagine the pre-GUI era, but in the late 1970s and early 1980s, the GUI was strange. It required a new way of thinking. Designers exploited the metaphor of the of!ce or desktop to give users a jump-start on the interface (Johnson et al., 1989). And it worked. Rather than learning some thing new and unfamiliar, users could act out with concepts already understood: documents, folders, !ling cabinets, trashcans, the top of the desk, pointing, select ing, dragging, dropping, and so on. This is the essence of mental models.

Implementation models are to be avoided. These are systems that impose on the user a set of interactions that follow the inner workings of an application. Cooper and Reimann give the example of a software-based fax product where the user is paced through a series of agonizing details and dialogs (Cooper and Riemann, 2003, p. 25). Interaction follows an implementation model, rather than the user’s mental model of how to send a fax. The user is prompted for information when it is convenient for the program to receive it, not when it makes sense to the user. Users often have pre-existing experiences with artifacts like faxes, calendars, media players, and so on. It is desirable to exploit these at every opportunity in designing a software-based product. Let’s examine a few other examples in human-computer interfaces.

Toolbars in GUIs are fertile ground for mental models. To keep the buttons small and of a consistent size, they are adorned with an icon rather than a label. An icon is a pictorial representation. In HCI, icons trigger a mental image in the user’s mind, a clue to a real-world experience that is similar to the action associated with the button or tool. Icons in drawing and painting applications provide good exam ples. Figure 3.19a shows the Tool Palette in Corel’s Paint Shop Pro, a painting and image manipulation application. 12 The palette contains 21 buttons, each displaying an icon. Each button is associated with a function and its icon is carefully chosen to elicit the association in the user’s mind. Some are clear, like the magnifying glass or the paintbrush. Some are less clear. Have a look. Can you tell what action is

12 www.jasc.com.

FIGURE 3.19

Icons create associations. (a) Array of toolbar buttons from Corel’s Paint Shop Pro. (b) Tooltip help for “Picture Tube” icon.

associated with each button? Probably not. But users of this application likely know the meaning of most of these buttons.

Preparing this example gave me pause to consider my own experience with this toolbar. I use this application frequently, yet some of the buttons are entirely strange to me. In 1991 Apple introduced a method to help users like me. Hover the mouse pointer over a GUI button and a !eld pops up providing a terse elaboration on the button’s purpose. Apple called the popups balloons, although today they are more commonly known as tooltips or screen tips. Figure 3.19b gives an example for a button in Paint Shop Pro. Apparently, the button’s purpose is related to a pic ture tube. I’m still in the dark, but I take solace in knowing that I am just a typical user: “Each user learns the smallest set of features that he needs to get his work done, and he abandons the rest.” (Cooper, 1999, p. 33)

Another example of mental models are a compass and a clock face as meta phors for direction. Most users have an ingrained understanding of a compass and a clock. The inherent labels can serve as mental models for direction. Once there is an understanding that a metaphor is present, the user has a mental model and uses it ef!ciently and accurately for direction: north, for straight ahead or up, west for left, and so on. As an HCI example, Lindeman et al. (2005) used the mental model of a compass to help virtual reality users navigate a building. Users wore a vibro-tactile belt with eight actuators positioned according to compass directions. They were able to navigate the virtual building using a mental model of the compass. There is also a long history in HCI of using a compass metaphor for stylus gestures with pie menus (Callahan et al., 1988) and marking menus (G. P. Kurtenbach, Sellen, and Buxton, 1993; Li, Hinckley, Guan, and Landay, 2005).

With twelve divisions, a clock provides !ner granularity than a com pass (“obstacle ahead at 2 o’clock!”). Examples in HCI include numeric entry (Goldstein, Chincholle, and Backström, 2000; Isokoski and Käki, 2002; McQueen, MacKenzie, and Zhang, 1995) and locating people and objects in an environment (Sáenz and Sánchez, 2009; A. Sellen, Eardley, Iazdl, and Harper, 2006). Using a clock metaphor for numeric entry with a stylus is shown in Figure 3.20. Instead of scripting numbers using Roman characters, the numbers are entered using straight line strokes. The direction of the stroke is the number’s position on a clock face. In a longitudinal study, McQueen et al. (1995) found that numeric entry was about

FIGURE 3.20

Mental model example: (a) Clock face. (b) Numeric entry with a stylus.

24 percent faster using straight-line strokes compared to handwritten digits. The 12 o’clock position was used for 0. The 10 o’clock and 11 o’clock positions were reserved for system commands.

Sáenz and Sánchez describe a system to assist the blind (Sáenz and Sánchez, 2009) using the clock metaphor. Users carried a mobile locating device that pro vided spoken audio information about the location of nearby objects (see Figure 3.21a). For the metaphor to work, the user is assumed to be facing the 12 o’clock position. The system allowed users to navigate a building eyes-free (Figure 3.21b). Users could request position and orientation information from the locator. Auditory responses were provided using the clock metaphor and a text-to-speech mod ule (e.g., “door at 3 o’clock”). A similar interface is Rümelin et al.’s NaviRadar (Rümelin, Rukzio, and Hardy, 2012), which uses tactile feedback rather that audi tory feedback. Although not speci!cally using the clock metaphor, NaviRadar lev erages users’ spatial sense of their surroundings to aid navigation. Users receive combinations of long and short vibratory pulses to indicate direction (Figure 3.21c). Although the patterns must be learned, the system is simple and avoids auditory feedback, which may be impractical in some situations.

The systems described by Sáenz and Sánchez (2009) and Rümelin et al. (2012) have similar aims yet were presented and evaluated in different ways. Sáenz and Sánchez emphasized and described the system architecture in detail. Although this is of interest to some in the HCI community, from the user’s perspective the sys tem architecture is irrelevant. A user test was reported, but the evaluation was not experimental. There were no independent or dependent variables. Users performed tasks with the system and then responded to questionnaire items, expressing their level of agreement to assertions such as “The software was motivating,” or “I like the sounds in the software.” While qualitative assessments are an essential compo nent of any evaluation, the navigation and locating aides described in this work are well suited to experimental testing. Alternative implementations, even minor modi !cations to the interface, are potential independent variables. Speed (e.g., the time to complete tasks) and accuracy (e.g., the number of wrong turns, retries, direction changes, wall collisions) are potential dependent variables.

FIGURE 3.21

Spatial metaphor: (a) Auditory feedback provides information for locating objects, such as “object at 4 o’clock.” (b) Navigation task. (c) NaviRadar.

(Source: b, adapted from Sáenz and Sánchez, 2009; c, adapted from Rumelin et al., 2012)

Rümelin et al. (2012) took an empirical approach to system tests. Their research included both the technical details of NaviRadar and an evaluation in a formal experiment with independent variables, dependent variables, and so on. The main independent variable included different intensities, durations, and rhythms in the tactile pulses. Since their approach was empirical, valuable analyses were possi ble. They reported, for example, the deviation of indicated and reported directions and how this varied according to direction and the type of tactile information given. Their approach enables other researchers to study the strengths and weaknesses in NaviRadar in empirical terms and consider methods of improvement.

Chapter 52

Prototyping Tools and Techniques

Michel Beaudouin-Lafon, Université Paris-Sud, mbl@lri.fr Wendy E. Mackay, INRIA, wendy.mackay@inria.fr

1. Introduction

“A good design is better than you think” (Rex Heftman, cited by Raskin, 2000). Design is about making choices. In many fields that require creativity and engineering skill, such as architecture or automobile design, prototypes both inform the design process and help designers select the best solution. This chapter describes tools and techniques for using prototypes to design interactive systems. The goal is to illustrate how they can help designers generate and share new ideas, get feedback from users or customers, choose among design alternatives, and articulate reasons for their final choices.

We begin with our definition of a prototype and then discuss prototypes as design artifacts, introducing four dimensions for analyzing them. We then discuss the role of prototyping within the design process, in particular the concept of a design space, and how it is expanded and contracted by generating and selecting design ideas. The next three sections describe specific prototyping approaches: Rapid prototyping, both off-line and on-line, for early stages of design, iterative prototyping, which uses on-line development tools, and evolutionary prototyping, which must be based on a sound software architecture.

What is a prototype?

We define a prototype as a concrete representation of part or all of an interactive system. A prototype is a tangible artifact, not an abstract description that requires interpretation. Designers, as well as managers, developers, customers and end users, can use these artifacts to envision and reflect upon the final system.

Note that prototypes may be defined differently in other fields. For example, an architectural prototype is a scaled-down model of the final building. This is not possible for interactive system prototypes: the designer may limit the amount of information the prototype can handle, but the actual interface must be presented at full scale. Thus, a prototype interface to a database may handle only a small pseudo database but must still present a full-size display and interaction techniques. Full-scale, one-of-a-kind models, such as a hand-made dress sample, are another type of prototype. These usually require an additional design phase in order to mass-produce the final design. Some interactive system prototypes begin as one-of-a-kind models which are then distributed widely (since the cost of duplicating software is so low). However, most successful software prototypes evolve into the final product and then continue to evolve as new versions of the software are released.

Hardware and software engineers often create prototypes to study the feasibility of a technical process. They conduct systematic, scientific evaluations with respect to pre-defined benchmarks and, by systematically varying parameters, fine-tune the system. Designers in creative fields, such as typography or graphic design, create prototypes to express ideas and reflect on them. This approach is intuitive, oriented more to discovery and generation of new ideas than to evaluation of existing ideas.

Human-Computer Interaction is a multi-disciplinary field which combines elements of science, engineering and design (Mackay and Fayard, 1997, Djkstra Erikson et al., 2001). Prototyping is primarily a design activity, although we use software engineering to ensure that software prototypes evolve into technically sound working systems and we use scientific methods to study the effectiveness of particular designs.

2. Prototypes as design artifacts

We can look at prototypes as both concrete artifacts in their own right or as important components of the design process. When viewed as artifacts, successful prototypes have several characteristics: They support creativity, helping the developer to capture and generate ideas, facilitate the exploration of a design space and uncover relevant information about users and their work practices. They encourage communication, helping designers, engineers, managers, software developers, customers and users to discuss options and interact with each other. They also permit early evaluation since they can be tested in various ways, including traditional usability studies and informal user feedback, throughout the design process.

We can analyze prototypes and prototyping techniques along four dimensions:

• Representation describes the form of the prototype, e.g., sets of paper

sketches or computer simulations;

• Precision describes the level of detail at which the prototype is to be

evaluated; e.g., informal and rough or highly polished;

• Interactivity describes the extent to which the user can actually interact with

the prototype; e.g., watch-only or fully interactive; and

• Evolution describes the expected life-cycle of the prototype, e.g. throw

away or iterative.

2.1 Representation

Prototypes serve different purposes and thus take different forms. A series of quick sketches on paper can be considered a prototype; so can a detailed computer simulation. Both are useful; both help the designer in different ways. We distinguish between two basic forms of representation: off-line and on-line.

Off-line prototypes (also called paper prototypes) do not require a computer. They include paper sketches, illustrated story-boards, cardboard mock-ups and videos. The most salient characteristics of off-line prototypes (of interactive systems) is that they are created quickly, usually in the early stages of design, and they are usually thrown away when they have served their purpose.

On-line prototypes (also called software prototypes) run on a computer. They include computer animations, interactive video presentations, programs written with scripting languages, and applications developed with interface builders. The cost of producing on-line prototypes is usually higher, and may require skilled programmers to implement advanced interaction and/or visualization techniques or to meet tight performance constraints. Software prototypes are usually more effective in the later stages of design, when the basic design strategy has been decided.

In our experience, programmers often argue in favor of software prototypes even at the earliest stages of design. Because they already are already familiar with a programming language, these programmers believe it will be faster and more useful to write code than to "waste time" creating paper prototypes. In twenty years of prototyping, in both research and industrial settings, we have yet to find a situation in which this is true.

First, off-line prototypes are very inexpensive and quick. This permits a very rapid iteration cycle and helps prevent the designer from becoming overly attached to the first possible solution. Off-line prototypes make it easier to explore the design space (see section 3.1), examining a variety of design alternatives and choosing the most effective solution. On-line prototypes introduce an intermediary between the idea and the implementation, slowing down the design cycle.

Second, off-line prototypes are less likely to constrain how the designer thinks. Every programming language or development environment imposes constraints on the interface, limiting creativity and restricting the number of ideas considered. If a particular tool makes it easy to create scroll-bars and pull-down menus and difficult to create a zoomable interface, the designer is likely to limit the interface accordingly. Considering a wider range of alternatives, even if the developer ends up using a standard set of interface widgets, usually results in a more creative design.

Finally and perhaps most importantly, off-line prototypes can be created by a wide range of people: not just programmers. Thus all types of designers, technical or otherwise, as well as users, managers and other interested parties, can all contribute on an equal basis. Unlike programming software, modifying a story board or cardboard mock-up requires no particular skill. Collaborating on paper prototypes not only increases participation in the design process, but also improves communication among team members and increases the likelihood that the final design solution will be well accepted.

Although we believe strongly in off-line prototypes, they are not a panacea. In some situations, they are insufficient to fully evaluate a particular design idea. For example, interfaces requiring rapid feedback to users or complex, dynamic visualizations usually require software prototypes. However, particularly when using video and wizard-of-oz techniques, off-line prototypes can be used to create very sophisticated representations of the system.

Prototyping is an iterative process and all prototypes provide information about some aspects while ignoring others. The designer must consider the purpose of the prototype (Houde and Hill, 1997) at each stage of the design process and choose the representation that is best suited to the current design question.

2.2 Precision

Prototypes are explicit representations that help designers, engineers and users reason about the system being built. By their nature, prototypes require details. A verbal description such as "the user opens the file" or "the system displays the results" provides no information about what the user actually does. Prototypes force designers to show the interaction: just how does the user open the file and what are the specific results that appear on the screen?

Precision refers to the relevance of details with respect to the purpose of the prototype1 . For example, when sketching a dialog box, the designer specifies its size, the positions of each field and the titles of each label. However not all these details are relevant to the goal of the prototype: it may be necessary to show where the labels are, but too early to choose the text. The designer can convey this by writing nonsense words or drawing squiggles, which shows the need for labels without specifying their actual content.

Although it may seem contradictory, a detailed representation need not be precise. This is an important characteristic of prototypes: those parts of the prototype that are not precise are those open for future discussion or for exploration of the design space. Yet they need to be incarnated in some form so the prototype can be evaluated and iterated.

The level of precision usually increases as successive prototypes are developed and more and more details are set. The forms of the prototypes reflect their level of precision: sketches tend not to be precise, whereas computer simulations are usually very precise. Graphic designers often prefer using hand sketches for early prototypes because the drawing style can directly reflect what is precise and what is not: the wigglely shape of an object or a squiggle that represents a label are directly perceived as imprecise. This is more difficult to achieve with an on-line drawing tool or a user-interface builder.

The form of the prototype must be adapted to the desired level of precision. Precision defines the tension between what the prototype states (relevant details) and what the prototype leaves open (irrelevant details). What the prototype states is subject to evaluation; what the prototype leaves open is subject to more discussion and design space exploration.

2.3 Interactivity

An important characteristic of HCI systems is that they are interactive: users both respond to them and act upon them. Unfortunately, designing effective interaction is difficult: many interactive systems (including many web sites) have a good “look” but a poor “feel”. HCI designers can draw from a long tradition in visual design for the former, but have relatively little experience with how interactive software systems should be used: personal computers have only been common place for about a decade. Another problem is that the quality of interaction is tightly linked to the end users and a deep understanding of their work practices: a word processor designed for a professional typographer requires a different interaction design than one designed for secretaries, even though ostensibly they serve similar purposes. Designers must take the context of use into account when designing the details of the interaction.

A critical role for an interactive system prototype is to illustrate how the user will interact with the system. While this may seem more natural with on-line prototypes, in fact it is often easier to explore different interaction strategies with off-line prototypes. Note that interactivity and precision are orthogonal dimensions. One can create an imprecise prototype that is highly interactive, such as a series of paper screen images in which one person acts as the user and the other plays the system. Or, one may create a very precise but non-interactive 1 Note that the terms low-fidelity and high-fidelity prototypes are often used in the literature. We prefer the term precision because it refers to the content of the prototype itself, not its relationship to the final, as-yet-undefined system.

prototype, such as a detailed animation that shows feedback from a specific action by a user.

Prototypes can support interaction in various ways. For off-line prototypes, one person (often with help from others) plays the role of the interactive system, presenting information and responding to the actions of another person playing the role of the user. For on-line prototypes, parts of the software are implemented, while others are "played" by a person. (This approach, called the Wizard of Oz after the character in the 1939 movie of the same name, is explained in section 4.1.) The key is that the prototype feels interactive to the user.

Prototypes can support different levels of interaction. Fixed prototypes, such as video clips or pre-computed animations, are non-interactive: the user cannot interact, or pretend to interact, with it. Fixed prototypes are often used to illustrate or test scenarios (see chapter 53). Fixed-path prototypes support limited interaction. The extreme case is a fixed prototype in which each step is triggered by a pre-specified user action. For example, the person controlling the prototype might present the user with a screen containing a menu. When the user points to the desired item, she presents the corresponding screen showing a dialog box. When the user points to the word "OK", she presents the screen that shows the effect of the command. Even though the position of the click is irrelevant (it is used as a trigger), the person in the role of the user can get a feel for the interaction. Of course, this type of prototype can be much more sophisticated, with multiple options at each step. Fixed-path prototypes are very effective with scenarios and can also be used for horizontal and task-based prototypes (see section 3.1).

Open prototypes support large sets of interactions. Such prototypes work like the real system, with some limitations. They usually only cover part of the system (see vertical prototypes, section 3.1), and often have limited error-handling or reduced performance relative to that of the final system.

Prototypes may thus illustrate or test different levels of interactivity. Fixed prototypes simply illustrate what the interaction might look like. Fixed-path prototypes provide designers and users with the experience of what the interaction might be like, but only in pre-specified situations. Open prototypes allow designers to test a wide range of examples of how users will interact with the

system.

2.4 Evolution

Prototypes have different life spans: rapid prototypes are created for a specific purpose and then thrown away, iterative prototypes evolve, either to work out some details (increasing their precision) or to explore various alternatives, and evolutionary prototypes are designed to become part of the final system.

Rapid prototypes are especially important in the early stages of design. They must be inexpensive and easy to produce, since the goal is to quickly explore a wide variety of possible types of interaction and then throw them away. Note that rapid prototypes may be off-line or on-line. Creating precise software prototypes, even if they must be re-implemented in the final version of the system, is important for detecting and fixing interaction problems. Section 4 presents specific prototyping techniques, both off-line and on-line.

Iterative prototypes are developed as a reflection of a design in progress, with the explicit goal of evolving through several design iterations. Designing prototypes that support evolution is sometimes difficult. There is a tension between evolving toward the final solution and exploring an unexpected design direction, which may be adopted or thrown away completely. Each iteration should inform some aspect of the design. Some iterations explore different variations of the same theme. Others may systematically increase precision, working out the finer details of the interaction. Section 5 describes tools and techniques for creating iterative

prototypes.

Figure : 1

Evolutionary prototypes are a special case of iterative prototypes in which the prototype evolves into part or all of the final system (Fig.1). Obviously this only applies to software prototypes. Extreme Programming (Beck, 2000), advocates this approach, tightly coupling design and implementation and building the system through constant evolution of its components. Evolutionary prototypes require more planning and practice than the approaches above because the prototypes are both representations of the final system and the final system itself, making it more difficult to explore alternative designs. We advocate a combined approach, beginning with rapid prototypes and then using iterative or evolutionary prototypes according to the needs of the project. Section 6 describes how to create evolutionary prototypes, by building upon software architectures specifically designed to support interactive systems.

3. Prototypes and the design process

In the previous section, we looked at prototypes as artifacts, i.e. the results of a design process. Prototypes can also be seen as artifacts for design, i.e. as an integral part of the design process. Prototyping helps designers think: prototypes are the tools they use to solve design problems. In this section we focus on prototyping as a process and its relationship to the overall design process.

User-centered design

The field of Human-Computer Interaction is both user-centered (Norman & Draper, 1986) and iterative. User-centered design places the user at the center of the design process, from the initial analysis of user requirements (see chapters 48 50 in this volume) to testing and evaluation (see chapters 56-59 in this volume). Prototypes support this goal by allowing users see and experience the final system long before it is built. Designers can identify functional requirements, usability problems and performance issues early and improve the design accordingly.

Iterative design involves multiple design-implement-test loops2 , enabling the designer to generate different ideas and successively improve upon them. Prototypes support this goal by allowing designers to evaluate concrete representations of design ideas and select the best.

Prototypes reveal the strengths as well as the weaknesses of a design. Unlike pure ideas, abstract models or other representations, they can be contextualized to help understand how the real system would be used in a real setting. Because prototypes are concrete and detailed, designers can explore different real-world scenarios and users can evaluate them with respect to their current needs. Prototypes can be compared directly with other, existing systems, and designers can learn about the context of use and the work practices of the end users. Prototypes can help designers (re)analyze the user's needs during the design process, not abstractly as with traditional requirements analysis, but in the context of the system being built.

Participatory design

Participatory (also called Cooperative) Design is a form of user-centered design that actively involves the user in all phases the design process (see Greenbaum and Kyng, 1991, and chapter 54 in this volume.) Users are not simply consulted at the beginning and called in to evaluate the system at the end; they are treated as partners throughout. This early and active involvement of users helps designers avoid unpromising design paths and develop a deeper understanding of the actual design problem. Obtaining user feedback at each phase of the process also changes the nature of the final evaluation, which is used to fine-tune the interface rather than discover major usability problems.

A common misconception about participatory design is that designers are expected to abdicate their responsibilities as designers, leaving the design to the end user. In fact, the goal is for designers and users to work together, each contributing their strengths to clarify the design problem as well as explore design solutions. Designers must understand what users can and cannot contribute. Usually, users are best at understanding the context in which the system will be used and subtle aspects of the problems that must be solved. Innovative ideas can come from both users and designers, but the designer is responsible for considering a wide range of options that might not be known to the user and balancing the trade-offs among them.

Because prototypes are shared, concrete artifacts, they serve as an effective medium for communication within the design team. We have found that collaborating on prototype design is an effective way to involve users in participatory design. Prototypes help users articulate their needs and reflect on the efficacy of design solutions proposed by designers.

3.1 Exploring the design space

Design is not a natural science: the goal is not to describe and understand existing phenomena but to create something new. Designers do, of course, benefit from scientific research findings and they may use scientific methods to evaluate interactive systems. But designers also require specific techniques for generating new ideas and balancing complex sets of trade-offs, to help them develop and refine design ideas.

2 Software engineers refer to this as the Spiral model (Boehm, 1988).

Designers from fields such as architecture and graphic design have developed the concept of a design space, which constrains design possibilities along some dimensions, while leaving others open for creative exploration. Ideas for the design space come from many sources: existing systems, other designs, other designers, external inspiration and accidents that prompt new ideas. Designers are responsible for creating a design space specific to a particular design problem. They explore this design space, expanding and contracting it as they add and eliminate ideas. The process is iterative: more cyclic, than reductionist. That is, the designer does not begin with a rough idea and successively add more precise details until the final solution is reached. Instead, she begins with a design problem, which imposes set of constraints, and generates a set of ideas to form the initial design space. She then explores this design space, preferably with the user, and selects a particular design direction to pursue. This closes off part of the design space, but opens up new dimensions that can be explored. The designer generates additional ideas along these dimensions, explores the expanded design space, and then makes new design choices. Design principles (e.g., Beaudouin Lafon and Mackay, 2000) help this process by guiding it both in the exploration and choice phases. The process continues, in a cyclic expansion and contraction of the design space, until a satisfying solution is reached.

All designers work with constraints: not just limited budgets and programming resources, but also design constraints. These are not necessarily bad: one cannot be creative along all dimensions at once. However, some constraints are unnecessary, derived from poor framing of the original design problem. If we consider a design space as a set of ideas and a set of constraints, the designer has two options. She can modify ideas within the specified constraints or modify the constraints to enable new sets of ideas. Unlike traditional engineering, which treats the design problem as a given, designers are encouraged to challenge, and if necessary, change the initial design problem. If she reaches an impasse, the designer can either generate new ideas or redefine the problem (and thus change the constraints). Some of the most effective design solutions derive from a more careful understanding and reframing of the design brief.

Note that all members of the design team, including users, may contribute ideas to the design space and help select design directions from within it. However, it is essential that these two activities are kept separate. Expanding the design space requires creativity and openness to new ideas. During this phase, everyone should avoid criticizing ideas and concentrate on generating as many as possible. Clever ideas, half-finished ideas, silly ideas, impractical ideas: all contribute to the richness of the design space and improve the quality of the final solution. In contrast, contracting the design space requires critical evaluation of ideas. During this phase, everyone should consider the constraints and weigh the trade-offs. Each major design decision must eliminate part of the design space: rejecting ideas is necessary in order to experiment and refine others and make progress in the design process. Choosing a particular design direction should spark new sets of ideas, and those new ideas are likely to pose new design problems. In summary, exploring a design space is the process of moving back and forth between creativity and choice.

Prototypes aid designers in both aspects of working with a design space: generating concrete representations of new ideas and clarifying specific design directions. The next two sections describe techniques that have proven most useful in our own prototyping work, both for research and product development.

Expanding the design space: Generating ideas

The most well-known idea generation technique is brainstorming, introduced by Osborn (1957). His goal was to create synergy within the members of a group:

ideas suggested by one participant would spark ideas in other participants. Subsequent studies (Collaros and Anderson, 1969, Diehl and Stroebe, 1987) challenged the effectiveness of group brainstorming, finding that aggregates of individuals could produce the same number of ideas as groups. They found certain effects, such as production blocking, free-riding and evaluation apprehension, were sufficient to outweigh the benefits of synergy in brainstorming groups. Since then, many researchers have explored different strategies for addressing these limitations. For our purposes, the quantity of ideas is not the only important measure: the relationships among members of the group are also important. As de Vreede et al. (2000) point out, one should also consider elaboration of ideas, as group members react to each other's ideas.

We have found that brainstorming, including a variety of variants, is an important group-building exercise and participatory design. Designers may, of course, brainstorm ideas by themselves. But brainstorming in a group is more enjoyable and, if it is a recurring part of the design process, plays an important role in helping group members share and develop ideas together.

The simplest form of brainstorming involves a small group of people. The goal is to generate as many ideas as possible on a pre-specified topic: quantity not quality, is important. Brainstorming sessions have two phases: the first for generating ideas and the second for reflecting upon them. The initial phase should last no more than an hour. One person should moderate the session, keeping time, ensuring that everyone participates and preventing people from critiquing each other's ideas. Discussion should be limited to clarifying the meaning of a particular idea. A second person records every idea, usually on a flipchart or transparency on an overhead projector. After a short break, participants are asked to reread all the ideas and each person marks their three favorite ideas.

One variation is designed to ensure that everyone contributes, not just those who are verbally dominant. Participants write their ideas on individual cards or post-it notes for a pre-specified period of time. The moderator then reads each idea aloud. Authors are encouraged to elaborate (but not justify) their ideas, which are then posted on a whiteboard or flipchart. Group members may continue to generate new ideas, inspired by the others they hear.

We use a variant of brainstorming that involves prototypes called video brainstorming (Mackay, 2000): participants not only write or draw their ideas, they act them out in front of a video camera (Fig. 2). The goal is the same as other brainstorming exercises, i.e. to create as many new ideas as possible, without critiquing them. The use of video, combined with paper or cardboard mock-ups, encourages participants to actively experience the details of the interaction and to understand each idea from the perspective of the user.

Each video brainstorming idea takes 2-5 minutes to generate and capture, allowing participants to simulate a wide variety of ideas very quickly. The resulting video clips provide illustrations of each idea that are easier to understand (and remember) than hand-written notes. (We find that raw notes from brainstorming sessions are not very useful after a few weeks because the participants no longer remember the context in which the ideas were created.)

Figure : 2

videotaped. Video brainstorming requires thinking more deeply about each idea. It is easier to stay abstract when describing an interaction in words or even with a sketch, but acting out the interaction in front of the camera forces the author of the idea (and the other participants) to seriously consider how a user would interact with the idea. It also encourages designers and users to think about new ideas in the context in which they will be used. Video clips from a video brainstorming session, even though rough, are much easier for the design team, including developers, to interpret than ideas from a standard brainstorming session. We generally run a standard brainstorming session, either oral or with cards, prior to a video brainstorming session, to maximize the number of ideas to be explored. Participants then take their favorite ideas from the previous session and develop them further as video brainstorms. Each person is asked to "direct" at least two ideas, incorporating the hands or voices of other members of the group. We find that, unlike standard brainstorming, video brainstorming encourages even the quietest team members to participate.

Contracting the design space: Selecting alternatives

After expanding the design space by creating new ideas, designers must stop and reflect on the choices available to them. After exploring the design space, designers must evaluate their options and make concrete design decisions: choosing some ideas, specifically rejecting others, and leaving other aspects of the design open to further idea generation activities. Rejecting good, potentially effective ideas is difficult, but necessary to make progress.

Prototypes often make it easier to evaluate design ideas from the user's perspective. They provide concrete representations that can be compared. Many of the evaluation techniques described elsewhere in this handbook can be applied to prototypes, to help focus the design space. The simplest situation is when the designer must choose among several discrete, independent options. Running a simple experiment, using techniques borrowed from Psychology (see chapter 56) allows the designer to compare how users respond to each of the alternatives. The designer builds a prototype, with either fully-implemented or simulated versions of each option. The next step is to construct tasks or activities that are typical of how the system would be used, and ask people from the user population to try each of the options under controlled conditions. It is important to keep everything the same, except for the options being tested.

Designers should base their evaluations on both quantitative measures, such as speed or error rate, and qualitative measures, such as the user's subjective impressions of each option.

Ideally, of course, one design alternative will be clearly faster, prone to fewer errors and preferred by the majority of users. More often, the results are ambiguous, and the designer must take other factors into account when making the design choice. (Interestingly, running small experiments often highlights other design problems and may help the designer reformulate the design problem or change the design space.)

The more difficult (and common) situation, is when the designer faces a complex, interacting set of design alternatives, in which each design decision affects a number of others. Designers can use heuristic evaluation techniques, which rely on our understanding of human cognition, memory and sensory-perception (see chapters 1-6). They can also evaluate their designs with respect to ergonomic criteria (see chapter 51) or design principles (Beaudouin-Lafon and Mackay, 2000). See chapters 56-60 for a more thorough discussion of testing and evaluation methods.

Another strategy is to create one or more scenarios (see chapter 53) that illustrate how the combined set of features will be used in a realistic setting. The scenario must identify who is involved, where the activities take place, and what the user does over a specified period of time. Good scenarios involve more than a string of independent tasks; they should incorporate real-world activities, including common or repeated tasks, successful activities and break-downs and errors, with both typical and unusual events. The designer then creates a prototype that simulates or implements the aspects of the system necessary to illustrate each set of design alternatives. Such prototypes can be tested by asking users to "walk through" the same scenario several times, once for each design alternative. As with experiments and usability studies, designers can record both quantitative and qualitative data, depending on the level of the prototypes being tested.

The previous section described an idea-generation technique called video brainstorming, which allows designers to generate a variety of ideas about how to interact with the future system. We call the corresponding technique for focusing in on a design video prototyping. Video prototyping can incorporate any of the rapid-prototyping techniques (off-line or on-line) described in section 4.1. They are quick to build, force designers to consider the details of how users will react to the design in the context in which it will be used, and provide an inexpensive method of comparing complex sets of design decisions. See section 4.1 for more information on how to develop scenarios, storyboard and then videotape them.

To an outsider, video brainstorming and video prototyping techniques look very similar: both involve small design groups working together, creating rapid prototypes and interacting with them in front of a video camera. Both result in video illustrations that make abstract ideas concrete and help team members communicate with each other. The critical difference is that video brainstorming expands the design space, by creating a number of unconnected collections of individual ideas, whereas video prototyping contracts the design space, by showing how a specific collection of design choices work together.

3.2 Prototyping strategies

Designers must decide what role prototypes should play with respect to the final system and in which order to create different aspects of the prototype. The next section presents four strategies: horizontal, vertical , task-oriented and scenario based, which focus on different design concerns. These strategies can use any of the prototyping techniques covered in sections 4, 5 and 6.

Horizontal prototypes

The purpose of a horizontal prototype is to develop one entire layer of the design at the same time. This type of prototyping is most common with large software development teams, where designers with different skill sets address different layers of the software architecture. Horizontal prototypes of the user interface are useful to get an overall picture of the system from the user’s perspective and address issues such as consistency (similar functions are accessible through similar user commands), coverage (all required functions are supported) and redundancy (the same function is/is not accessible through different user commands).

User interface horizontal prototypes can begin with rapid prototypes and progress through to working code. Software prototypes can be built with an interface builder (see section 5.1), without creating any of the underlying functionality making it possible to test how the user will interact with the user interface without worrying about how the rest of the architecture works. However some level of scaffolding or simulation of the rest of the application is often necessary, otherwise the prototype cannot be evaluated properly. As a consequence, software horizontal prototypes tend to be evolutionary, i.e. they are progressively transformed into the final system.

Vertical prototypes

The purpose of a vertical prototype is to ensure that the designer can implement the full, working system, from the user interface layer down to the underlying system layer. Vertical prototypes are often built to assess the feasibility of a feature described in a horizontal, task-oriented or scenario-based prototype. For example, when we developed the notion of magnetic guidelines in the CPN2000 system to facilitate the alignment of graphical objects (Beaudouin-Lafon and Mackay, 2000), we implemented a vertical prototype to test not only the interaction technique but also the layout algorithm and the performance. We knew that we could only include the particular interaction technique if the we could implement a sufficiently fast response.

Vertical prototypes are generally high precision, software prototypes because their goal is to validate an idea at the system level. They are often thrown away because they are generally created early in the project, before the overall architecture has been decided, and they focus on only one design question. For example, a vertical prototype of a spelling checker for a text editor does not require text editing functions to be implemented and tested. However, the final version will need to be integrated into the rest of the system, which may involve considerable architectural or interface changes.

Task-oriented prototypes

Many user interface designers begin with a task analysis (see chapter 48), to identify the individual tasks that the user must accomplish with the system. Each task requires a corresponding set of functionality from the system. Task-based prototypes are organized as a series of tasks, which allows both designers and users to test each task independently, systematically working through the entire

system. Task-oriented prototypes include only the functions necessary to implement the specified set of tasks. They combine the breadth of horizontal prototypes, to cover the functions required by those tasks, with the depth of vertical prototypes, enabling detailed analysis of how the tasks can be supported. Depending on the goal of the prototype, both off-line and on-line representations can be used for task-oriented prototypes.

Scenario-based prototypes

Scenario-based prototypes are similar to task-oriented ones, except that they do not stress individual, independent tasks, but rather follow a more realistic scenario of how the system would be used in a real-world setting. Scenarios are stories that describe a sequence of events and how the user reacts (see chapter 53). A good scenario includes both common and unusual situations, and should explore patterns of activity over time. Bødker (1995) has developed a checklist to ensure that no important issues have been left out.

We find it useful to begin with use scenarios based on observations of or interviews with real users. Ideally, some of those users should participate in the creation of the specific scenarios, and other users should critique them based on how realistic they are. Use scenarios are then turned into design scenarios, in which the same situations are described but with the functionality of the new system. Design scenarios are used, among other things, to create scenario-based video prototypes or software prototypes. Like task-based prototypes, the developer needs to write only the software necessary to illustrate the components of the design scenario. The goal is to create a situation in which the user can experience what the system would be like in a realistic situation, even if it addresses only a subset of the planned functionality.

Section 4 describes a variety of rapid prototyping techniques which can be used in any of these four prototyping strategies. We begin with off-line rapid prototyping techniques, followed by on-line prototyping techniques.

4. Rapid prototypes

The goal of rapid prototyping is to develop prototypes very quickly, in a fraction of the time it would take to develop a working system. By shortening the prototype-evaluation cycle, the design team can evaluate more alternatives and iterate the design several times, improving the likelihood of finding a solution that successfully meets the user's needs.

How rapid is rapid depends on the context of the particular project and the stage in the design process. Early prototypes, e.g. sketches, can be created in a few minutes. Later in the design cycle, a prototype produced in less than a week may still be considered “rapid” if the final system is expected to take months or years to build. Precision, interactivity and evolution all affect the time it takes to create a prototype. Not surprisingly, a precise and interactive prototype takes more time to build than an imprecise or fixed one.

The techniques presented in this section are organized from most rapid to least rapid, according to the representation dimension introduced in section 2. Off-line techniques are generally more rapid than on-line one. However, creating successive iterations of an on-line prototype may end up being faster than creating new off-line prototypes.

4.1 Off-line rapid prototyping techniques

Off-line prototyping techniques range from simple to very elaborate. Because they do not involve software, they are usually considered a tool for thinking through the design issues, to be thrown away when they are no longer needed. This section describes simple paper and pencil sketches, three-dimensional mock-ups, wizard-of-oz simulations and video prototypes.

Paper & pencil

The fastest form of prototyping involves paper, transparencies and post-it notes to represent aspects of an interactive system (for an example, see Muller, 1991). By playing the roles of both the user and the system, designers can get a quick idea of a wide variety of different layout and interaction alternatives, in a very short period of time.

Designers can create a variety of low-cost "special effects". For example, a tiny triangle drawn at the end of a long strip cut from an overhead transparency makes a handy mouse pointer, which can be moved by a colleague in response to the user's actions. Post-it notes™, with prepared lists, can provide "pop-up menus". An overhead projector pointed at a whiteboard, makes it easy to project transparencies (hand-drawn or pre-printed, overlaid on each other as necessary) to create an interactive display on the wall. The user can interact by pointing (Fig. 3) or drawing on the whiteboard. One or more people can watch the user and move the transparencies in response to her actions. Everyone in the room gets an immediate impression of how the eventual interface might look and feel.

Figure : 3

Note that most paper prototypes begin with quick sketches on paper, then progress to more carefully-drawn screen images made with a computer (Fig. 4). In the early stages, the goal is to generate a wide range of ideas and expand the design space, not determine the final solution. Paper and pencil prototypes are an excellent starting point for horizontal, task-based and scenario-based prototyping strategies.

Figure : 4

Mock-ups

Architects use mock-ups or scaled prototypes to provide three-dimensional illustrations of future buildings. Mock-ups are also useful for interactive system designers, helping them move beyond two-dimensional images drawn on paper or transparencies (see Bødker et al., 1988). Generally made of cardboard, foamcore or other found materials, mock-ups are physical prototypes of the new system. Fig. 5 shows an example of a hand-held mockup showing the interface to a new hand-held device. The mock-up provides a deeper understanding of how the interaction will work in real-world situations than possible with sets of screen images.

Figure : 5

Figure : 6

Wizard of Oz

Sometimes it is useful to give users the impression that they are working with a real system, even before it exists. Kelley (1993) dubbed this technique the Wizard of Oz , based on the scene in the 1939 movie of the same name. The heroine, Dorothy, and her companions ask the mysterious Wizard of Oz for help. When they enter the room, they see an enormous green human head, breathing smoke and speaking with a deep, impressive voice. When they return later, they again see the Wizard. This time, Dorothy's small dog pulls back a curtain, revealing a frail old man pulling levers and making the mechanical Wizard of Oz speak. They realize that the impressive being before them is not a wizard at all, but simply an interactive illusion created by the old man.

The software version of the Wizard of Oz operates on the same principle. A user sits a terminal and interacts with a program. Hidden elsewhere, the software designer (the wizard) watches what the user does and, by responding in different ways, creates the illusion of a working software program. In some cases, the user is unaware that a person, rather than a computer, is operating the system.

The Wizard-of-Oz technique lets users interact with partially-functional computer systems. Whenever they encounter something that has not been implemented (or there is a bug), a human developer who is watching the interaction overrides the prototype system and plays the role destined to eventually be played by the computer. A combination of video and software can work well, depending upon what needs to be simulated.

The Wizard of Oz was initially used to develop natural language interfaces (e.g. Chapanis, 1982, Wixon, Whiteside, Good and Jones, 1993). Since then, the technique has been used in a wide variety of situations, particularly those in which rapid responses from users are not critical. Wizard of Oz simulations may consist of paper prototypes, fully-implemented systems and everything in between.

Video prototyping

Video prototypes (Mackay, 1988) use video to illustrate how users will interact with the new system. As explained in section 3.1, they differ from video brainstorming in that the goal is to refine a single design, not generate new ideas.

Video prototypes may build on paper & pencil prototypes and cardboard mock ups and can also use existing software and images of real-world settings.

We begin our video prototyping exercises by reviewing relevant data about users and their work practices, and then review ideas we video brainstormed. The next step is to create a use scenario, describing the user at work. Once the scenario is described in words, the designer develops a storyboard. Similar to a comic book, the storyboard shows a sequence of rough sketches of each action or event, with accompanying actions and/or dialog (or subtitles), with related annotations that explain what is happening in the scene or the type of shot (Fig. 7). A paragraph of text in a scenario corresponds to about a page of a storyboard.

<Insert image in file: figure.7.eps > Figure 7: Storyboard. This storyboard is based on observations of real Coloured

Petri Net users in a small company and illustrates how the CPN developer modifies a particular element of a net, the "Simple Protocol". Storyboards help designers refine their ideas, generate 'what if' scenarios for different approaches to a story, and communicate with the other people who are involved in creating the production. Storyboards may be informal "sketches" of ideas, with only partial information. Others follow a pre-defined format and are used to direct the production and editing of a video prototype. Designers should jot down notes on storyboards as they think through the details of the interaction. Storyboards can be used like comic books to communicate with other members of the design team. Designers and users can discuss the proposed system and alternative ideas for interacting with it (figure 8). Simple videos of each successive frame, with a voice over to explain what happens, can also be effective. However, we usually use storyboards to help us shoot video prototypes, which illustrate how a new system will look to a user in a real-world setting. We find that placing the elements of a storyboard on separate cards and arranging them (Mackay and Pagani, 1994) helps the designer experiment with different linear sequences and insert or delete video clips. However, the process of creating a video prototype, based on the storyboard, provides an even deeper understanding of the design.

Figure : 8

The storyboard guides the shooting of the video. We often use a technique called "editing-in-the-camera" (see Mackay, 2000) which allows us the create the video directly, without editing later. We use title cards, as in a silent movie, to separate the clips and to make it easier to shoot. A narrator explains each event and several people may be necessary to illustrate the interaction. Team members enjoy playing with special effects, such as "time-lapse photography". For example, we can record a user pressing a button, stop the camera, add a new dialog box, and then restart the camera, to create the illusion of immediate system feedback.

Video is not simply a way to capture events in the real world or to capture design ideas, but can be a tool for sketching and visualizing interactions. We use a second live video camera as a Wizard-of-Oz tool. The wizard should have access

to a set of prototyping materials representing screen objects. Other team members stand by, ready to help move objects as needed. The live camera is pointed at the wizard’s work area, with either a paper prototype or a partially-working software simulation. The resulting image is projected onto a screen or monitor in front of the user. One or more people should be situated so that they can observe the actions of the user and manipulate the projected video image accordingly. This is most effective if the wizard is well prepared for a variety of events and can present semi-automated information. The user interacts with the objects on the screen as wizard moves the relevant materials in direct response to each user action. The other camera records the interaction between the user and the simulated software system on the screen or monitor, to create either a video brainstorm (for a quick idea) or a fully-storyboarded video prototype).

Figure : 9

Fig. 9 shows a Wizard-of-oz simulation with a live video camera, video projector, whiteboard, overhead projector and transparencies. The setup allows two people to experience how they would communicate via a new interactive communication system. One video camera films the blond woman, who can see and talk to the brunette. Her image is projected live onto the left-side of the wall. An overhead projector displays hand-drawn transparencies, manipulated by two other people, in response to gestures made by the brunette. The entire interaction is videotaped by a second video camera.

Combining wizard-of-oz and video is a particularly powerful prototyping technique because it gives the person playing the user a real sense of what it might actually feel like to interact with the proposed tool, long before it has been implemented. Seeing a video clip of someone else interacting with a simulated tool is more effective than simply hearing about it; but interacting with it directly is more powerful still. Video prototyping may act as a form of specification for developers, enabling them to build the precise interface, both visually and interactively, created by the design team.

4.2 On-line rapid prototyping techniques

The goal of on-line rapid prototyping is to create higher-precision prototypes than can be achieved with off-line techniques. Such prototypes may prove useful to better communicate ideas to clients, managers, developers and end users. They are also useful for the design team to fine tune the details of a layout or an interaction. They may exhibit problems in the design that were not apparent in less precise prototypes. Finally they may be used early on in the design process for low precision prototypes that would be difficult to create off-line, such as when very dynamic interactions or visualizations are needed.

The techniques presented in this section are sorted by interactivity. We start with non-interactive simulations, i.e. animations, followed by interactive simulations that provide fixed or multiple-paths interactions. We finish with scripting languages which support open interactions.

Non-interactive simulations

A non-interactive simulation is a computer-generated animation that represents what a person would see of the system if he or she were watching over the user’s shoulder. Non-interactive simulations are usually created when off-line prototypes, including video, fail to capture a particular aspect of the interaction and it is important to have a quick prototype to evaluate the idea. It's usually best to start by creating a storyboard to describe the animation, especially if the developer of the prototype is not a member of the design team.

One of the most widely-used tools for non-interactive simulations is Macromedia Director™. The designer defines graphical objects called sprites, and defines paths along which to animate them. The succession of events, such as when sprites appear and disappear, is determined with a time-line. Sprites are usually created with drawing tools, such as Adobe Illustrator or Deneba Canvas, painting tools, such as Adobe Photoshop, or even scanned images. Director is a very powerful tool; experienced developer can create sophisticated interactive simulations. However, non-interactive simulations are much faster to create. Other similar tools exist on the market, including Abvent Katabounga, Adobe AfterEffects and Macromedia Flash (Fig. 10).

Figure : 10

Figure 11 shows a set of animation movies created by Dave Curbow to explore the notion of accountability in computer systems (Dourish, 1997). These prototypes explore new ways to inform the user of the progress of a file copy operation. They were created with Macromind Director by combining custom made sprites with sprites extracted from snapshots of the Macintosh Finder. The simulation features cursor motion, icons being dragged, windows opening and closing, etc. The result is a realistic prototype that shows how the interface looks and behaves, that was created in just a few hours. Note that the simulation also features text annotations to explain each step, which helps document the

prototype.

Figure : 11

Non-interactive animations can be created with any tool that generates images. For example, many Web designers use Adobe Photoshop to create simulations of their web sites. Photoshop images are composed of various layers that overlap like transparencies. The visibility and relative position of each layer can be controlled independently. Designers can quickly add or delete visual elements, simply by changing the characteristics of the relevant layer. This permits quick comparisons of alternative designs and helps visualize multiple pages that share a common layout or banner. Skilled Photoshop users find this approach much faster than most web authoring tools.

We used this technique in the CPN2000 project (Mackay et al., 2000) to prototype the use of transparency. After several prototyping sessions with transparencies and overhead projectors, we moved to the computer to understand the differences between the physical transparencies and the transparent effect as it would be rendered on a computer screen. We later developed an interactive prototype with OpenGL, which required an order of magnitude more time to implement than the Photoshop mock-up.

Interactive simulations

Designers can also use tools such as Adobe Photoshop to create Wizard-of-Oz simulations. For example, the effect of dragging an icon with the mouse can be obtained by placing the icon of a file in one layer and the icon of the cursor in another layer, and by moving either or both layers. The visibility of layers, as well as other attributes, can also create more complex effects. Like Wizard-of-Oz and other paper prototyping techniques, the behavior of the interface is generated by the user who is operating the Photoshop interface.

More specialized tools, such as Hypercard and Macromedia Director, can be used to create simulations that the user can directly interact with. Hypercard (Goodman, 1987) is one of the most successful early prototyping tools. It is an authoring environment based on a stack metaphor: a stack contains a set of cards that share a background, including fields and buttons. Each card can also have its own unique contents, including fields and buttons (Fig. 12). Stacks, cards, fields and buttons react to user events, e.g. clicking a button, as well as system events, e.g. when a new card is displayed or about to disappear (Fig. 13). Hypercard reacts according to events programmed with a scripting language called Hypertalk. For example, the following script is assigned to a button, which switches to the next card in the stack whenever the button is clicked. If this button is included in the stack background, the user will be able to browse through the entire stack:

on click goto next card end click

Figure : 12

Figure : 13

permission requested)

Interfaces can be prototyped quickly with this approach, by drawing different states in successive cards and using buttons to switch from one card to the next. Multiple-path interactions can be programmed by using several buttons on each card. More open interactions require more advanced use of the scripting language, but are fairly easy to master with a little practice.

Director uses a different metaphor, attaching behaviors to sprites and to frames of the animation. For example, a button can be defined by attaching a behavior to the sprite representing that button. When the sprite is clicked, the animation jumps to a different sequence. This is usually coupled with a behavior attached to the frame containing the button that loops the animation on the same frame. As a result, nothing happens until the user clicks the button, at which point the animation skips to a sequence where, for example, a dialog box opens. The same technique can be used to make the OK and Cancel buttons of the dialog box interactive. Typically, the Cancel button would skip to the original frame while the OK button would skip to a third sequence. Director comes with a large library of behaviors to describe such interactions, so that prototypes can be created completely interactively. New behaviors can also be defined with a scripting language called Lingo.

Many educational and cultural CD-ROMs are created exclusively with Director. They often feature original visual displays and interaction techniques that would be almost impossible to create with the traditional user interface development tools described in section 5. Designers should consider tools like Hypercard and Director as user interface builders or user interface development environments. In some situations, they can even be used for evolutionary prototypes (see section 6).

Scripting languages

Scripting languages are the most advanced rapid prototyping tools. As with the interactive-simulation tools described above, the distinction between rapid prototyping tools and development tools is not always clear. Scripting languages make it easy to quickly develop throw-away quickly (a few hours to a few days), which may or may not be used in the final system, for performance or other technical reasons.

A scripting language is a programming language that is both lightweight and easy to learn. Most scripting languages are interpreted or semi-compiled, i.e. the user does not need to go through a compile-link-run cycle each time the script (program) is changed. Scripting languages can be forbidding: they are not strongly typed and non fatal errors are ignored unless explicitly trapped by the programmer. Scripting languages are often used to write small applications for specific purposes and can serve as glue between pre-existing applications or software components. Tcl (Ousterhout, 1993) was inspired by the syntax of the Unix shell, it makes it very easy to interface existing applications by turning the application programming interface (API) into a set of commands that can be called directly from a Tcl script.

Tcl is particularly suitable to develop user interface prototypes (or small to medium-size applications) because of its Tk user interface toolkit. Tk features all the traditional interactive objects (called “widgets”) of a UI toolkit: buttons, menus, scrollbars, lists, dialog boxes, etc. A widget is typically only one line. For example:

button .dialogbox.ok -text OK -command {destroy .dialogbox}

This command creates a button, called “.dialogbox.ok”, whose label is “OK”. It deletes its parent window “.dialogbox” when the button pressed. A traditional programming language and toolkit would take 5-20 lines to create the same button.

Tcl also has two advanced, heavily-parameterized widgets: the text widget and the canvas widget. The text widget can be used to prototype text-based interfaces. Any character in the text can react to user input through the use of tags. For example, it is possible to turn a string of characters into a hypertext link. In Beaudouin-Lafon (2000), the text widget was used to prototype a new method for finding and replacing text. When entering the search string, all occurrences of the string are highlighted in the text (Fig. 14). Once a replace string has been entered, clicking an occurrence replaces it (the highlighting changes from yellow to red). Clicking a replaced occurrence returns it to its original value. This example also uses the canvas widget to create a custom scrollbar that displays the positions and status of the occurrences.

Figure : 14

The Tk canvas widget is a drawing surface that can contain arbitrary objects: lines, rectangles, ovals, polygons, text strings, and widgets. Tags allow behaviors (i.e. scripts) that are called when the user acts on these objects. For example, an object that can be dragged will be assigned a tag with three behaviors: button-press, mouse-move and button-up. Because of the flexibility of the canvas, advanced visualization and interaction techniques can be implemented more quickly and easily than with other tools. For example, Fig. 15 shows a prototype exploring new ideas to manage overlapping windows on the screen (Beaudouin-Lafon, 2001). Windows can be stacked and slightly rotated so that it is easier to recognize them, and they can be folded so it is possible to see what is underneath without having to move the window. Even though the prototype is not perfect (for example, folding a window that contains text is not properly supported), it was instrumental in identifying a number of problems with the interaction techniques and finding appropriate solutions through iterative design.

Figure : 15

Tcl and Tk can also be used with other programming languages. For example, Pad++ (Bederson & Meyer, 1998) is implemented as an extension to Tcl/Tk: the zoomable interface is implemented in C for performance, and accessible from Tk as a new widget. This makes it easy to prototype interfaces that use zooming. It is also a way to develop evolutionary prototypes: a first prototype is implemented completely in Tcl, then parts of are re-implemented in a compiled language to performance. Ultimately, the complete system may be implemented in another language, although it is more likely that some parts will remain in Tcl.

Software prototypes can also be used in conjunction with hardware prototypes. Figure 16 shows an example of a hardware prototype that captures hand-written text from a paper flight strip (using a combination of a graphics tablet and a custom-designed system for detecting the position of the paper strip holder). We used Tk/TCL, in conjunction with C++, to present information on a RADAR screen (tied to an existing air traffic control simulator) and to provide feedback on a touch-sensitive display next to the paper flight strips (Caméléon, Mackay et al., 1998). The user can write in the ordinary way on the paper flight strip, and the system interprets the gestures according to the location of the writing on the strip. For example, a change in flight level is automatically sent to another controller for confirmation and a physical tap on the strip's ID lights up the corresponding aircraft on the RADAR screen.

Fig. : 16

5. Iterative prototypes

Prototypes may also be developed with traditional software development tools. In particular, high-precision prototypes usually require a level of performance that cannot be achieved with the rapid on-line prototyping techniques described above. Similarly, evolutionary prototypes intended to evolve into the final product require more traditional software development tools. Finally, even shipped products are not "final", since subsequent releases can be viewed as initial designs for prototyping the next release.

Development tools for interactive systems have been in use for over twenty years and are constantly being refined. Several studies have shown that the part of the development cost of an application spent on the user interface is 50% - 80% of the total cost of the project (Myers & Rosson, 1992). The goal of development tools is to shift this balance by reducing production and maintenance costs. Another goal of development tools is to anticipate the evolution of the system over successive releases and support iterative design.

Interactive systems are inherently more powerful than non interactive ones (see Wegner, 1997, for a theoretical argument). They do not match the traditional, purely algorithmic, type of programming: an interactive system must handle user input and generate output at almost any time, whereas an algorithmic system reads input at the beginning, processes it, and displays results at the end. In addition, interactive systems must process input and output at rates that are compatible with the human perception-action loop, i.e. in time frames of 20ms to 200ms. In practice, interactive systems are both reactive and real-time systems, two active areas in computer science research.

The need to develop interactive systems more efficiently has led to two inter related streams of work. The first involves creation of software tools, from low level user-interface libraries and toolkits to high-level user interface development environments (UIDE). The second addresses software architectures for interactive systems: how system functions are mapped onto software modules. The rest of this section presents the most salient contributions of these two streams of work.

5.1 Software tools

Since the advent of graphical user interfaces in the eighties, a large number of tools have been developed to help with the creation of interactive software, most aimed at visual interfaces. This section presents a collection of tools, from low level, i.e. requiring a lot of programming, to high-level.

The lowest-level tools are graphical libraries ,that provide hardware-independence for painting pixels on a screen and handling user input, and window systems that provide an abstraction (the window) to structure the screen into several “virtual terminals”. User interface toolkits structure an interface as a tree of interactive objects called widgets, while user interface builders provide an interactive application to create and edit those widget trees. Application frameworks build on toolkits and UI builders to facilitate creation of typical functions such as cut/copy/paste, undo, help and interfaces based on editing multiple documents in separate windows. Model-based tools semi-automatically derive an interface from a specification of the domain objects and functions to be supported. Finally, user interface development environments or UIDEs provide an integrated collection of tools for the development of interactive software.

Before we describe each of these categories in more detail, it is important to understand how they can be used for prototyping. It is not always best to use the highest-level available tool for prototyping. High-level tools are most valuable in the long term because they make it easier to maintain the system, port it to various platforms or localize it to different languages. These issues are irrelevant for vertical and throw-away prototypes, so a high-level tool may prove less effective than a lower-level one.

The main disadvantage of higher-level tools is that they constrain or stereotype the types of interfaces they can implement. User interface toolkits usually contain a limited set of “widgets” and it is expensive to create new ones. If the design must incorporate new interaction techniques, such as bimanual interaction (Kurtenbach et al., 1997) or zoomable interfaces (Bederson & Hollan, 1994), a user interface toolkit will hinder rather than help prototype development. Similarly, application frameworks assume a stereotyped application with a menu bar, several toolbars, a set of windows holding documents, etc. Such a framework would be inappropriate for developing a game or a multimedia educational CD-ROM that requires a fluid, dynamic and original user interface.

Finally, developers need to truly master these tools, especially when prototyping in support of a design team. Success depends on the programmer's ability to quickly change the details as well as the overall structure of the prototype. A developer will be more productive when using a familiar tool than if forced to use a more powerful but unknown tool.

Graphical libraries and Window systems

Graphical libraries underlie all the other tools presented in this section. Their main purpose is to provide the developer with a hardware-independent, and sometimes cross-platform application programming interface (API) for drawing on the screen. They can be separated into two categories: direct drawing and scene-graph based. Direct drawing libraries provide functions to draw shapes on the screen, once specified their geometry and their graphical attributes. This means that every time something is to be changed on the display, the programmer has to either redraw the whole screen or figure out exactly which parts have changed. Xlib on Unix systems, Quickdraw on MacOS, Win32 GDI on Windows and OpenGL (Woo et al., 1997) on all three platforms are all direct drawing libraries. They offer the best compromise between performance and flexibility, but are difficult to

program.

Scene-graph based libraries explicitly represent the contents of the display by a structure called a scene graph. It can be a simple list (called display list), a tree (as used by many user interface toolkits – see next subsection), or a direct acyclic graph (DAG). Rather than painting on the screen the developer creates and updates the scene graph, and the library is responsible for updating the screen to reflect the scene graph. Scene graphs are mostly used for 3D graphics, e.g., OpenInventor (Strass, 1993), but in recent years they have been used for 2D as well (Bederson et al., 2000, Beaudouin-Lafon & Lassen, 2000). With the advent of hardware-accelerated graphics card, scene-graph based graphics libraries can offer outstanding performance while easing the task of the developer.

Window systems provide an abstraction to allow multiple client applications to share the same screen. Applications create windows and draw into them. From the application perspective, windows are independent and behave as separate screens. All graphical libraries include or interface with a window system. Window systems also offer a user interface to manipulate windows (move, resize, close, change stacking order, etc.), called the window manager. The window manager may be a separate application (as in X-Windows), or it may be built into the window system (as in Windows), or it may be controlled of each application (as in MacOS). Each solution offers a different trade-off between flexibility and programming cost.

Graphical libraries include or are complemented by an input subsystem. The input subsystem is event driven: each time the user interacts with an input device, an event recording the interaction is added to an input event queue. The input subsystem API lets the programmer query the input queue and remove events from it. This technique is much more flexible than polling the input devices repeatedly or waiting until an input device is activated. In order to ensure that input event are handled in a timely fashion, the application has to execute an event loop that retrieves the first event in the queue and handles it as fast as possible. Every time an event sits in the queue, there is a delay between the user action and the system reaction. As a consequence, the event loop sits at the heart of almost every interactive system.

Window systems complement the input subsystem by routing events to the appropriate client application based on its focus. The focus may be specified explicitly for a device (e.g. the keyboard) or implicitly through the cursor position (the event goes to the window under the cursor). Scene-graph based libraries usually provide a picking service to identify which objects in the scene graph are under or in the vicinity of the cursor.

Although graphical libraries and window systems are fairly low-level, they must often be used when prototyping novel interaction and/or visualization techniques. Usually, these prototypes are developed when performance is key to the success of a design. For example, a zoomable interface that cannot provide continuous zooming at interactive frame rates is unlikely to be usable. The goal of the prototype is then to measure performance in order to validate the feasibility of the design.

User interface toolkits

User interface toolkits are probably the most widely used tool nowadays to implement applications. All three major platforms (Unix/Linux, MacOS and Windows) come with at least one standard UI toolkit. The main abstraction provided by a UI toolkit is the widget. A widget is a software object that has three facets that closely match the MVC model: a presentation, a behavior and an application interface.

The presentation defines the graphical aspect of the widget. Usually, the presentation can be controlled by the application, but also externally. For example, under X-Windows, it is possible to change the appearance of widgets in any application by editing a text file specifying the colors, sizes and labels of buttons, menu entries, etc. The overall presentation of an interface is created by assembling widgets into a tree. Widgets such as buttons are the leaves of the tree. Composite widgets constitute the nodes of the tree: a composite widgets contains other widgets and controls their arrangement. For example menu widgets in a menu bar are stacked horizontally, while command widgets in a menu are stacked vertically. Widgets in a dialog box are laid out at fixed positions, or relative to each other so that the layout may be recomputed when the window is resized. Such constraint-based layout saves time because the interface does not need to be re-laid out completely when a widget is added or when its size changes as a result of, e.g., changing its label.

The behavior of a widget defines the interaction methods it supports: a button can be pressed, a scrollbar can be scrolled, a text field can be edited. The behavior also includes the various possible states of a widget. For example, most widgets can be active or inactive, some can be highlighted, etc. The behavior of a widget is usually hardwired and defines its class (menu, button, list, etc.). However it is sometimes parameterized, e.g. a list widget may be set to support single or multiple selection.

One limitation of widgets is that their behavior is limited to the widget itself. Interaction techniques that involve multiple widgets, such as drag-and-drop, cannot be supported by the widgets’ behavior alone and require a separate support in the UI toolkit. Some interaction techniques, such as toolglasses or magic lenses (Bier et al., 1993), break the widget model both with respect to the presentation and the behavior and cannot be supported by traditional toolkits. In general, prototyping new interaction techniques requires either implementing them within new widget classes, which is not always possible, or not using a toolkit at all. Implementing a new widget class is typically more complicated than implementing the new technique outside the toolkit, e.g. with a graphical library, and is rarely justified for prototyping. Many toolkits provide a “blank” widget (Canvas in Tk, Drawing Area in Motif, JFrame in Java Swing) that can be used by the application to implement its own presentation and behavior. This is usually a good alternative to implementing a new widget class, even for production code.

The application interface of a widget defines how it communicate the results of the user interactions to the rest of the application. Three main techniques exist. The first and most common one is called callback functions or callback for short: when the widget is created, the application registers the name of a one or more functions with it. When the widget is activated by the user, it calls the registered functions (Fig. 17). The problem with this approach is that the logic of the application is split among many callback functions (Myers, 1991).

Define callback

User action activates callback

DoPrint (...) {...}

DoPrint (...)

Fig. : 17

Define active variable

User action changes value Changed value updates widget

Fig. : 18

Define listener object

User action activates listener

Fig. : 19

PrintDialog p

p.HandleEvent(ev)

User interface toolkits have been an active area of research over the past 15 years. InterViews (Linton et al., 1989) has inspired many modern toolkits and user interface builders. A number of toolkits have also been developed for specific applications such as groupware (Roseman and Greenberg, 1996, 1999) or visualization (Schroeder et al., 1997).

Creating an application or a prototype with a UI toolkit requires a solid knowledge of the toolkit and experience with programming interactive applications. In order to control the complexity of the inter-relations between independent pieces of code (creation of widgets, callbacks, global variables, etc.), it is important to use well known design patterns. Otherwise the code quickly becomes unmanageable and, in the case of a prototype, unsuitable to design space exploration. Two categories of tools have been designed to ease the task of developers: user interface builders and application frameworks.

User-interface builders

A user interface builder allows the developer of an interactive system to create the presentation of the user interface, i.e. the tree of widgets, interactively with a graphical editor. The editor features a palette of widgets that the user can use to “draw” the interface in the same way as a graphical editor is used to create diagrams with lines, circles and rectangles. The presentation attributes of each widget can be edited interactively as well as the overall layout. This saves a lot of time that would otherwise be spent writing and fine-tuning rather dull code that creates widgets and specifies their attributes. It also makes it extremely easy to explore and test design alternatives.

User interface builders focus on the presentation of the interface. They also offer some facilities to describe the behavior of the interface and to test the interaction.

Some systems allow the interactive specification of common behaviors such as a menu command opening a dialog box, a button closing a dialog box, a scrollbar controlling a list or text. The user interface builder can then be switched to a “test” mode where widgets are not passive objects but work for real. This may be enough to test prototypes for simple applications, even though there is no functional core nor application data.

In order to create an actual application, the part of the interface generated by the UI builder must be assembled with the missing parts, i.e. the functional core, the application interface code that could not be described from within the builder, and the run-time module of the generator. Most generators save the interface into a file that can be loaded at run-time by the generator’s run-time (Fig. 20). With this method, the application needs only be re-generated when the functional core changes, not when the user interface changes. This makes it easy to test alternative designs or to iteratively create the interface: each time a new version of the interface is created, it can be readily tested by re-running the application.

Figure : 20

In order to make it even easier to modify the interface and test the effects with the real functional core, the interface editor can be built into the target application (Fig. 21). Changes to the interface can then be made from within the application and tested without re-running it. This situation occurs most often with interface builders based on an interpreted language (e.g. Tcl/Tk, Visual Basic).

Figure : 21

compiler

Figure : 22

5.2 Software environments

Application frameworks

Application frameworks address a different problem than user interface builders and are actually complementary. Many applications have a standard form where windows represent documents that can be edited with menu commands and tools from palettes; each document may be saved into a disk file; standard functions such as copy/paste, undo, help are supported. Implementing such stereotyped applications with a UI toolkit or UI builder requires replicating a significant amount of code to implement the general logic of the application and the basics of the standard functions.

Application frameworks address this issue by providing a shell that the developer fills with the functional core and the actual presentation of the non-standard parts of the interface. Most frameworks have been inspired by MacApp, a framework developed in the eighties to develop applications for the Macintosh (Apple Computer, 1996). Typical base classes of MacApp include Document, View, Command and Application. MacApp supports multiple document windows, multiple views of a document, cut/copy/paste, undo, saving documents to files, scripting, and more.

With the advent of object-oriented technology, most application frameworks are implemented as collections of classes. Some classes provide services such as help or drag-and-drop and are used as client classes. Many classes are meant to be derived in order to add the application functionality through inheritance rather than by changing the actual code of the framework. This makes it easy to support successive versions of the framework and limits the risks of breaking existing code. Some frameworks are more specialized than MacApp. For example, Unidraw (Vlissides and Linton, 1990) is a framework for creating graphical editors in domains such as technical and artistic drawing, music composition, or circuit design. By addressing a smaller set of applications, such a framework can provide more support and significantly reduce implementation time.

Mastering an application framework takes time. It requires knowledge of the underlying toolkit and the design patterns used in the framework, and a good understanding of the design philosophy of the framework. A framework is useful because it provides a number of functions “for free”, but at the same time it constrains the design space that can be explored. Frameworks can prove effective for prototyping if their limits are well understood by the design team.

Model-based tools

User interface builders and application frameworks approach the development of interactive applications through the presentation side: first the presentation is built, then behavior, i.e., interaction, is added, finally the interface is connected to the functional core. Model-based tools take the other approach, starting with the functional core and domain objects, and working their way towards the user interface and the presentation (Szekely et al., 1992, 1993). The motivation for this approach is that the raison d’être of a user interface is the application data and functions that will be accessed by the user. Therefore it is important to start with the domain objects and related functions and derive the interface from them. The goal of these tools is to provide a semi-automatic generation of the user interface from the high-level specifications, including specification of the domain objects and functions, specification of user tasks, specification of presentation and interaction styles.

Despite significant efforts, the model-based approach is still in the realm of research: no commercial tool exists yet. By attempting to define an interface declaratively, model-based tools rely on a knowledge base of user interface design to be used by the generation tools that transform the specifications into an actual interface. In other words, they attempt to do what designers do when they iteratively and painstakingly create an interactive system. This approach can probably work for well-defined problems with well-known solutions, i.e. families of interfaces that address similar problems. For example, it may be the case that interfaces for Management Information Systems (MIS) could be created with model-based tools because these interfaces are fairly similar and well understood.

In their current form, model-based tools may be useful to create early horizontal or task-based prototypes. In particular they can be used to generate a “default” interface that can serve as a starting point for iterative design. Future systems may be more flexible and therefore usable for other types of prototypes.

User interface development environments

Like model-based tools, user interface development environments (UIDE) attempt to support the development of the whole interactive system. The approach is more pragmatic than the model-based approach however. It consists in assembling a number of tools into an environment where different aspects of an interactive system can be specified and generated separately.

Garnet tools

Garnet toolkit

Fig. : 23

Fig. : 24

Silk (Landay & Myers, 2001) is a tool aimed at the early stages of design, when interfaces are sketched rather than prototyped in software. Using Silk, a user can sketch a user interface directly on the screen (Fig. 24). Using gesture recognition, Silk interprets the marks as widgets, annotations, etc. Even in its sketched form, the user interface is functional: buttons can be pressed, tools can be selected in a toolbar, etc. The sketch can also be turned into an actual interface, e.g. using the Motif toolkit. Finally, storyboards can be created to describe and test sequences of interactions. Silk therefore combines some aspects of off-line and on-line prototyping techniques, trying to get the best of both worlds. This illustrates a current trend in research where on-line tools attempt to support not only the development of the final system, but the whole design process.

6. Evolutionary Prototypes

Evolutionary prototypes are a special case of iterative prototypes, that are intended to evolve into the final system. Methodologies such as Extreme Programming (Beck, 2000) consist mostly in developing evolutionary prototypes.

Since prototypes are rarely robust nor complete, it is often impractical and sometimes dangerous to evolve them into the final system. Designers must think carefully about the underlying software architecture of the prototype, and developers should use well-documented design patterns to implement them.

6.1 Software architectures

The definition of the software architecture is traditionally done after the functional specification is written, but before coding starts. The designers design on the structure of the application and how functions will be implemented by software modules. The software architecture is the assignment of functions to modules. Ideally, each function should be implemented by a single module and modules should have minimal dependencies among them. Poor architectures increase development costs (coding, testing and integration), lower maintainability, and reduce performance. An architecture designed to support prototyping and evolution is crucial to ensure that design alternatives can be tested with maximum flexibility and at a reasonable cost.

Seeheim and Arch

The first generic architecture for interactive systems was devised at a workshop in Seeheim (Germany) in 1985 and is known as the Seeheim model (Pfaff, 1985). It separates the interactive application into a user interface and a functional core (then called “application”, because the user interface was seen as adding a “coat of paint” on top of an existing application). The user interface is made of three modules: the presentation, the dialogue controller, and the application interface (Fig. 25). The presentation deals with capturing user’s input at a low level (often called lexical level by comparison with the lexical, syntactic and semantic levels of a compiler). The presentation is also responsible for generating output to the user, usually as visual display. The dialogue controller assembles the user input into commands (a.k.a. syntactic level), provides some immediate feedback for the action being carried out, such as an elastic rubber line, and detects errors. Finally, the application interface interprets the commands into calls to the functional core (a.k.a. semantic level). It also interprets the results of these calls and turns them into output to be presented to the user.

Figure : 25

Figure : 26

MVC and PAC

Architecture models such as Seeheim and Arch are abstract models and are thus rather imprecise. They deal with categories of modules such as presentation or dialogue, when in an actual architecture several modules will deal with presentation and several others with dialogue.

The Model-View-Controller or MVC model (Krasner and Pope, 1988) is much more concrete. MVC was created for the implementation of the Smalltalk-80 environment (Goldberg & Robson, 1983) and is implemented as a set of Smalltalk classes. The model describes the interface of an application as a collection of triplets of objects. Each triplet contains a model, a view and a controller. A Model represents information that needs to be represented and interacted with. It is controlled by applications objects. A View displays the information in a model in a certain way. A Controller interprets user input on the view and transforms it into changes in the model. When a model changes it notifies its view so the display can be updated.

Views and controllers are tightly coupled and sometimes implemented as a single object. A model is abstract when it has no view and no controller. It is non interactive if it has a view but no controller. The MVC triplets are usually composed into a tree, e.g. an abstract model represents the whole interface, it has several components that are themselves models such as the menu bar, the document windows, etc., all the way down to individual interface elements such as buttons and scrollbars. MVC supports multiple views fairly easily: the views share a single model; when a controller modifies the model, all the views are notified and update their presentation.

The Presentation-Abstraction-Control model, or PAC (Coutaz, 1987) is close to MVC. Like MVC, an architecture based on PAC is made of a set of objects, called PAC agents, organized in a tree. A PAC agent has three facets: the Presentation takes care of capturing user input and generating output; the Abstraction holds the application data, like a Model in MVC; the Control facet manages the communication between the abstraction and presentation facets of the agent, and with sub-agents and super-agents in the tree. Like MVC, multiple views are easily supported. Unlike MVC, PAC is an abstract model, i.e. there is no reference implementation.

A variant of MVC, called MVP (Model-View-Presenter), is very close to PAC and is used in ObjectArts' Dolphin Smalltalk. Other architecture models have been created for specific purposes such as groupware (Dewan, 1999) or graphical applications (Fekete and Beaudouin-Lafon, 1996).

6.2 Design patterns

Architecture models such as Arch or PAC only address the overall design of interactive software. PAC is more fine-grained than Arch, and MVC is more concrete since it is based on an implementation. Still, a user interface developer has to address many issues in order to turn an architecture into a working system.

Design patterns have emerged in recent years as a way to capture effective solutions to recurrent software design problems. In their book, Gamma et al. (1995) present 23 patterns. It is interesting to note than many of these patterns come from interactive software, and most of them can be applied to the design of interactive systems. It is beyond the scope of this chapter to describe these patterns in detail. However it is interesting that most patterns for interactive systems are behavioral patterns, i.e. patterns that describe how to implement the control structure of the system.

Indeed, there is a battle for control in interactive software. In traditional, algorithmic software, the algorithm is in control and decides when to read input and write output. In interactive software, the user interface needs to be in control because user input should drive the system’s reactions. Unfortunately, more often than not, the functional core also needs to be in control. This is especially common when creating user interfaces for legacy applications. In the Seeheim and Arch models, it is often believed that control is located in the dialog controller when in fact these architecture models do not explicitly address the issue of control. In MVC, the three basic classes Model, View and Controller implement a sophisticated protocol to ensure that user input is taken into account in a timely manner and that changes to a model are properly reflected in the view (or views). Some authors actually describe MVC as a design pattern, not an architecture. In fact it is both: the inner workings of the three basic classes is a pattern, but the decomposition of the application into a set of MVC triplets is an architectural issue.

It is now widely accepted that interactive software is event-driven, i.e. the execution is driven by the user’s actions, leading to a control localized in the user interface components. Design patterns such as Command, Chain of Responsibility, Mediator, and Observer (Gamma et al., 1995) are especially useful to implement the transformation of low-level user event into higher-level commands, to find out which object in the architecture responds to the command, and to propagate the changes produced by a command from internal objects of the functional core to user interface objects.

Using design patterns to implement an interactive systems not only saves time, it also makes the system more open to changes and easier to maintain. Therefore software prototypes should be implemented by experienced developers who know their pattern language and who understand the need for flexibility and evolution.

7. Summary

Prototyping is an essential component of interactive system design. Prototypes may take many forms, from rough sketches to detailed working prototypes. They provide concrete representations of design ideas and give designers, users and developers and managers an early glimpse into how the new system will look and feel. Prototypes increase creativity, allow early evaluation of design ideas, help designers think through and solve design problems, and support communication within multi-disciplinary design teams.

Prototypes, because they are concrete and not abstract, provide a rich medium for exploring a design space. They suggest alternate design paths and reveal important details about particular design decisions. They force designers to be creative and to articulate their design decisions. Prototypes embody design ideas and encourage designers to confront their differences of opinion. The precise aspects of a prototype offer specific design solutions: designers can then decide to generate and compare alternatives. The imprecise or incomplete aspects of a prototype highlight the areas that must be refined or require additional ideas.

We begin by defining prototypes and then discuss them as design artifacts. We introduce four dimensions by which they can be analyzed: representation, precision, interactivity and evolution. We then discuss the role of prototyping within the design process and explain the concept of creating, exploring and modifying a design space. We briefly describe techniques for generating new ideas, to expand the design space, and techniques for choosing among design alternatives, to contract the design space.

We describe a variety of rapid prototyping techniques for exploring ideas quickly and inexpensively in the early stages of design, including off-line techniques (from paper&pencil to video) and on-line techniques (from fixed to interactive simulations). We then describe iterative prototyping techniques for working out the details of the on-line interaction, including software development tools and software environments We conclude with evolutionary prototyping techniques, which are designed to evolve into the final software system, including a discussion of the underlying software architectures, design patterns and extreme programming.

This chapter has focused mostly on graphical user interfaces (GUIs) that run on traditional workstations. Such applications are dominant today, but this is changing as new devices are being introduced, from cell-phones and PDAs to wall-size displays. New interaction styles are emerging, such as augmented reality, mixed reality and ubiquitous computing. Designing new interactive devices and the interactive software that runs on them is becoming ever more challenging: whether aimed at a wide audience or a small number of specialists, the hardware and software systems must be adapted to their contexts of use. The methods, tools and techniques presented in this chapter can easily be applied to these new applications.

We view design as an active process of working with a design space, expanding it by generating new ideas and contracting as design choices are made. Prototypes are flexible tools that help designers envision this design space, reflect upon it, and test their design decisions. Prototypes are diverse and can fit within any part of the design process, from the earliest ideas to the final details of the design. Perhaps most important, prototypes provide one of the most effective means for designers to communicate with each other, as well as with users, developers and managers, throughout the design process.

8. References

Apple Computer (1996). Programmer's Guide to MacApp. Beaudouin-Lafon, M. (2000). Instrumental Interaction: An Interaction Model for Designing Post-WIMP User Interfaces. Proceedings ACM Human Factors in Computing Systems, CHI'2000, pp.446-453, ACM Press. Beaudouin-Lafon, M. (2001). Novel Interaction Techniques for Overlapping Windows. Proceedings of ACM Symposium on User Interface Software and Technology, UIST 2001, CHI Letters 3(2), ACM Press. In Press. Beaudouin-Lafon, M. and Lassen, M. (2000) The Architecture and Implementation of a Post-WIMP Graphical Application. Proceedings of ACM

Symposium on User Interface Software and Technology, UIST 2000, CHI

Letters 2(2):191-190, ACM Press. Beaudouin-Lafon, M. and Mackay, W. (2000) Reification, Polymorphism and Reuse: Three Principles for Designing Visual Interfaces. In Proc. Conference on Advanced Visual Interfaces, AVI 2000, Palermo, Italy, May 2000, p.102-109. Beck, K. (2000). Extreme Programming Explained. New York: Addison Wesley.

Bederson, B. and Hollan, J. (1994). Pad++: A Zooming Graphical Interface for Exploring Alternate Interface Physics. Proceedings of ACM Symposium on User Interface Software and Technology, UIST’94, pp.17-26, ACM Press. Bederson, B. and Meyer, J. (1998). Implementing a Zooming Interface: Experience Building Pad++. Software Practice and Experience, 28(10):1101 1135.

Bederson, B.B., Meyer, J., Good, L. (2000) Jazz: An Extensible Zoomable User Interface Graphics ToolKit in Java. Proceedings of ACM Symposium on User Interface Software and Technology, UIST 2000, CHI Letters 2(2):171-180, ACM Press.

Bier, E., Stone, M., Pier, K., Buxton, W., De Rose, T. (1993) Toolglass and Magic Lenses : the See-Through Interface. Proceedings ACM SIGGRAPH, pp.73-80, ACM Press.

Boehm, B. (1988). A Spiral Model of Software Development and Enhancement. IEEE Computer, 21(5):61-72.

Bødker, S., Ehn, P., Knudsen, J., Kyng, M. and Madsen, K. (1988) Computer support for cooperative design. In Proceedings of the CSCW'88 ACM

Computer-Supported Cooperative Work. Portland, OR: ACM Conference on

Press, pp. 377-393.

Chapanis, A. (1982) Man/Computer Research at Johns Hopkins, Information Technology and Psychology: Prospects for the Future. Kasschau, Lachman & Laughery (Eds.) Praeger Publishers, Third Houston Symposium, NY, NY.

Collaros, P.A., Anderson, L.R. (1969), Effect of perceived expertness upon creativity of members of brainstorming groups. Journal of Applied Psychology, 53, 159-163.

Coutaz, J. (1987). PAC, an Object Oriented Model for Dialog Design. In

Proceedings of INTERACT’87,

Bullinger, H.-J. and Shackel, B. (eds.), pp.431-436, Elsevier Science Publishers.

Dewan, P. (1999). Architectures for Collaborative Applications. In Beaudouin Lafon, M. (ed.), Computer-Supported Co-operative Work, Trends in Software Series, Wiley, pp.169-193.

Djkstra-Erikson, E., Mackay, W.E. and Arnowitz, J. (March, 2001) Trialogue on Design of. ACM/Interactions, pp. 109-117.

Dourish, P. (1997). Accounting for System Behaviour: Representation, Reflection and Resourceful Action. In Kyng and Mathiassen (eds), Computers and Design in Context. Cambridge: MIT Press, pp.145-170.

Eckstein, R., Loy, M. and Wood, D. (1998). Java Swing. Cambridge MA: O’Reilly.

Fekete, J-D. and Beaudouin-Lafon, M. (1996). Using the Multi-layer Model for Building Interactive Graphical Applications. In Proc. ACM Symposium on User Interface Software and Technology, UIST'96, ACM Press, p. 109-118.

Gamma, E., Helm, R., Johnson, R., Vlissides, J. (1995). Design Patterns, Elements of Reusable Object-Oriented Software. Reading MA: Addison Wesley.

Gibson, J. J. (1979). The Ecological Approach to Visual Perception. Boston: Houghton Mifflin.

Goldberg, A. and Robson, D. (1983). Smalltalk--80: The language and its implementation. Reading MA: Addison Wesley.

Goodman, D. (1987). The Complete HyperCard Handbook. New York: Bantam Books.

Greenbaum, J. and Kyng, M., eds (1991). Design at Work: Cooperative Design of Computer Systems. Hillsdale NJ: Lawrence Erlbaum Associates.

Houde, S. and Hill, C. (1997). What do Prototypes Prototype? In Handbook of Human Computer Interaction 2ed complety revised. North-Holland, pp.367-381.

Kelley, J.F. (1983) An empirical methodology for writing user-friendly natural language computer applications. In Proceedings of CHI '83 Conference on Human Factors in Computing Systems. Boston, Massachusetts.

Krasner, E.G. and Pope, S.T. (1988). A Cookbook for Using the Model-View Controller User Interface Paradigm in Smalltalk-80. Journal of Object-Oriented Programming, August/September 1988, pp.27-49.

Kurtenbach, G., Fitzmaurice, G., Baudel, T., Buxton. W. (1997). The Design of a GUI Paradigm based on Tablets, Two-hands, and Transparency. Proceedings of ACM Human Factors in Computing Systems, CHI'97, pp.35-42, ACM Press.

Landay, J. and Myers, B.A. (2001). Sketching Interfaces: Toward More Human Interface Design. IEEE Computer, 34(3):56-64.

Linton, M.A., Vlissides, J.M., Calder, P.R. (1989). Composing user interfaces with InterViews, IEEE Computer, 22(2):8-22.

Mackay, W.E. (1988) Video Prototyping: A technique for developing hypermedia systems. Demonstration in Proceedings of CHI'88, Conference on Human Factors in Computing, Washington, D.C.

Mackay, W.E. and Pagani, D. (1994) Video Mosaic: Laying out time in a physical space. Proceedings of ACM Multimedia '94. San Francisco, CA: ACM, pp.165-172.

Mackay, W.E. and Fayard, A-L. (1997) HCI, Natural Science and Design: A Framework for Triangulation Across Disciplines. Proceedings of ACM DIS '97, Designing Interactive Systems. Amsterdam, Pays-Bas: ACM/SIGCHI, pp.223 234.

Mackay, W.E. (2000) Video Techniques for Participatory Design: Observation, Brainstorming & Prototyping. Tutorial Notes, CHI 2000, Human Factors in Computing Systems. The Hague, the Netherlands. (148 pages) URL: www.lri.fr/~mackay/publications

Mackay, W., Ratzer, A. and Janecek, P. (2000) Video Artifacts for Design: Bridging the Gap between Abstraction and Detail. Proceedings ACM Conference on Designing Interactive Systems, DIS 2000, pp.72-82, ACM Press.

Myers, B.A., Giuse, D.A., Dannenberg, R.B., Vander Zander, B., Kosbie, D.S., Pervin, E., Mickish, A., Marchal, P. (1990). Garnet: Comprehensive Support for Graphical, Highly-Interactive User Interfaces. IEEE Computer, 23(11):71-85.

Myers, B.A. (1991). Separating application code from toolkits: Eliminating the spaghetti of call-backs. Proceedings of ACM SIGGRAPH Symposium on User Interface Software and Technology, UIST '91, pp.211-220.

Myers, B.A. and Rosson, M.B. (1992). Survey on user interface programming.

In ACM Conference on Human Factors in Computing Systems, CHI’92, pp.195

202, ACM Press. Myers, B.A., McDaniel, R.G., Miller, R.C., Ferrency, A.S., Faulring, A., Kyle, B.D., Mickish, A., Klimotivtski, A., Doane, P. (1997). The Amulet environment. IEEE Transactions on Software Engineering, 23(6):347 - 365.

NeXT Corporation (1991). NeXT Interface Builder Reference Manual. Redwood City, California.

Norman, D.A. and Draper S.W., eds (1986). User Centered System Design. Hillsdale NJ: Lawrence Erlbaum Associates.

Osborn, A. (1957), Applied imagination: Principles and procedures of creative

thinking (rev. ed.), New York: Scribner's. Ousterhout, J.K. (1994). Tcl and the Tk Toolkit. Reading MA: Addison Wesley. Perkins, R., Keller, D.S. and Ludolph, F (1997). Inventing the Lisa User Interface. ACM Interactions, 4(1):40-53.

Pfaff, G.P. and P. J. W. ten Hagen, P.J.W., eds (1985). User Interface Management Systems. Berlin: Springer.

Raskin, J. (2000). The Humane Interface. New York: Addison-Wesley. Roseman, M. and Greenberg, S. (1999). Groupware Toolkits for Synchronous Work. In Beaudouin-Lafon, M. (ed.), Computer-Supported Co-operative Work, Trends in Software Series, Wiley, pp.135-168.

Roseman, M. and Greenberg, S. (1996). Building real-time groupware with GroupKit, a groupware toolkit. ACM Transactions on Computer-Human Interaction, 3(1):66-106.

Schroeder, W., Martin, K., Lorensen, B. (1997). The Visualization Toolkit. Prentice Hall.

Strass, P. (1993) IRIS Inventor, a 3D Graphics Toolkit. Proceedings ACM Conference on Object-Oriented Programming, Systems, Languages and Applications, OOPSLA '93, pp.192-200.

Szekely, P., Luo, P. and Neches, R. (1992). Facilitating the Exploration of Interface Design Alternatives: The HUMANOID. Proceedings of ACM Conference on Human Factors in Computing Systems, CHI’92, pp.507-515. Szekely, P., Luo, P. and Neches, R. (1993). Beyond Interface Builders: Model based Interface Tools. Proceedings of ACM/IFIP Conference on Human Factors in Computing Systems, INTERCHI’93, pp.383-390.

The UIMS Workshop Tool Developers (1992). A Metamodel for the Runtime Architecture of an Interactive System. SIGCHI Bulletin, 24(1):32-37. Vlissides, J.M. and Linton, M.A. (1990). Unidraw: a framework for building domain-specific graphical editors. ACM Transactions on Information Systems, 8(3):237 - 268.

Wegner, P. (1997). Why Interaction is More Powerful Than Algorithms. Communications of the ACM, 40(5):80-91.

Woo, M., Neider, J. and Davis, T. (1997) OpenGL Programming Guide, Reading MA: Addison-Wesley

Handbook of Human-Computer Interaction Second, completely revised edition

M. Helander, T.K. Landauer, P. Prabhu (eds.)

9 1997 Elsevier Science B.V. All rights reserved.

Chapter 16

What do Prototypes Prototype?

Stephanie Houde and Charles Hill

Apple Computer, Inc. Cupertino, California, USA

16.1 Introduction .................................................... 367

16.2 The Problem with Prototypes ........................ 367

16.2.1 What is a Prototype? ................................ 368 16.2.2 Current Terminology ................................ 368

16.3 A Model of W h a t Prototypes Prototype ....... 369

16.3.1 Definitions ................................................ 369 16.3.2 The Model ................................................ 369 16.3.3 Three Prototypes of One System ............. 369

371

16.4.1 Role Prototypes ........................................ 372 16.4.2 Look and Feel Prototypes ........................ 374 16.4.3 Implementation Prototypes ...................... 376 16.4.4 Integration Prototypes .............................. 377

16.5 Summary ......................................................... 379 16.6 Acknowledgments ........................................... 380

16.7 Prototype Credits ........................................... 380

16.8 References ....................................................... 380

16.1 Introduction

Prototypes are widely recognized to be a core means of exploring and expressing designs for interactive com puter artifacts.

It is common practice to build proto types in order to represent different states of an evolv ing design and to explore options. However, since in teractive systems are complex, it may be difficult or impossible to create prototypes of a whole design in the formative stages of a project. Choosing the right kind of more focused prototype to build is an art in it self, and communicating its limited purposes to its various audiences is a critical aspect of its use.

The ways that we talk, and even think, about proto types can get in the way of their effective use. Current terminology for describing prototypes centers on at tributes of prototypes themselves, such as what tool was used to create them, and how refined-looking or -behaving they are. Such terms can be distracting. Tools can be used in many different ways, and detail is not a sure indicator of completion.

We propose a change in the language used to talk about prototypes, to focus more attention on funda mental questions about the interactive system being de signed: What role will the artifact play in a user's life? How should it look and feel? How should it be imple mented? The goal of this chapter is to establish a model that describes any prototype in terms of the artifact being designed, rather than the prototype's incidental attrib utes. By focusing on the purpose of the prototype--that

about the kinds of prototypes to build. pose for each prototype, we can better use prototypes to think and communicate about design.

In the first section we describe some current diffi culties in communicating about prototypes: the com plexity of interactive systems; issues of multi-discipli nary teamwork; and the audiences of prototypes. Next, we introduce the model and illustrate it with some initial examples of prototypes from real projects. In the follow ing section we present several more examples to illus trate some further issues. We conclude the chapter with a summary of the main implications of the model for prototyping practice.

With a clear pur

16.2 The Problem with Prototypes

Interactive computer systems are complex. Any artifact can have a rich variety of software, hardware, auditory, visual, and interactive features. For example, a per sonal digital assistant such as the Apple Newton has an operating system, a hard case with various ports, a graphical user interface and audio feedback. Users ex perience the combined effect of such interrelated fea tures; and the task of designingmand prototyping--the user experience is therefore complex. Every aspect of the system must be designed (or inherited from a pre vious system), and many features need to be evaluated

367

in combination with others.

Prototypes provide the means for examining design problems and evaluating solutions.

Selecting the focus of a prototype is the art of identifying the most impor tant open design questions. If the artifact is to provide new functionality for users--and thus play a new role in their lives--the most important questions may con cern exactly what that role should be and what features are needed to support it. If the role is well understood, but the goal of the artifact is to present its functionality in a novel way, then prototyping must focus on how the artifact will look and feel. If the artifact's functionality is to be based on a new technique, questions of how to implement the design may be the focus of prototyping efforts.

Once a prototype has been created, there are sev eral distinct audiences that designers discuss proto types with. These are: the intended users of the artifact being designed; their design teams; and the supporting organizations that they work within (Erickson, 1995). Designers evaluate their options with their own team by critiquing prototypes of alternate design directions. They show prototypes to users to get feedback on evolving designs. They show prototypes to their sup porting organizations (such as project managers, busi ness clients, or professors) to indicate progress and di rection.

It is difficult for designers to communicate clearly about prototypes to such a broad audience.

It is chal lenging to build protoypes which produce feedback from users on the most important design questions. Even communication among designers requires effort due to differing perspectives in a multi-disciplinary de sign team.

Limited understanding of design practice on the part of supporting organizations makes it hard for designers to explain their prototypes to them.

Fi nally, prototypes are not self-explanatory: looks can be deceiving. Clarifying what aspects of a prototype cor respond to the eventual artifact--and what don ' tRis a key part of successful prototyping.

Designing interactive systems demands collaboration between designers of many different disciplines (Kim, 1990). For example, a project might require the skills of a programmer, an interaction designer, an industrial designer, and a project manager. Even the term "proto type" is likely to be ambiguous on such a team. Every one has a different expectation of what a prototype is. Industrial designers call a molded foam model a proto type. Interaction designers refer to a simulation of on screen appearance and behavior as a prototype. Pro grammers call a test program a prototype. A user stud ies expert may call a storyboard which shows a sce nario of something being used, a prototype.

The organization supporting a design project may have an overly narrow expectation of what a prototype is. Shrage (1996) has shown that organizations develop their own "prototyping cultures" which may cause them to consider only certain kinds of prototypes to be valid. In some organizations, only prototypes which act as proof that an artifact can be produced are respected. In others, only highly detailed representations of look and feel are well understood.

Is a brick a prototype? The answer depends on how it is used.

If it is used to represent the weight and scale of some future artifact, then it certainly is: it prototypes the weight and scale of the artifact. This example shows that prototypes are not necessarily self explanatory. What is significant is not what media or tools were are used to create them, but how they are used by a designer to explore or demonstrate some as pect of the future artifact..

16.2.2 Current Terminology

Current ways of talking about prototypes tend to focus on attributes of the prototype itself, such as which tool was used to create it (as in "C", "Director TM'', and "paper" prototypes); and on how finished-looking or -behaving a prototype is (as in "high-fidelity" and "low-fidelity" prototypes). Such characterizations can be misleading because the capabilities and possible uses of tools are often misunderstood and the signifi cance of the level of finish is often unclear, particularly to non-designers.

Tools can be used in many different ways. Some times tools which have high-level scripting languages (like HyperCardTM), rather than full programming lan guages (like C), are thought to be unsuitable for pro ducing user-testable prototypes. However, Ehn and Kyng (1991) have shown that even prototypes made of cardboard are very useful for user testing. In the authors' experience, no one tool supports iterative de sign work in all of the important areas of investigation. To design well, designers must be willing to use differ ent tools for different prototyping tasks; and to team up with other people with complementary skills. Finished-looking (or-behaving) prototypes are of ten thought to indicate that the design they represent is near completion. Although this may sometimes be the case, a finished-looking prototype might be made early in the design process (e.g., a 3D concept model for use in market research), and a rough one might be made later on (e.g., to emphasize overall structure rather than visual details in a user test).

Two related terms are used in this context: "resolution" and "fidelity". We interpret resolution to mean "amount of detail", and fi delity to mean "closeness to the eventual design". It is important to recognize that the degree of visual and behavioral refinement of a prototype does not neces sarily correspond to the solidity of the design, or to a particular stage in the process.

16.3 A Model of What Prototypes

Prototype

16.3.1 Definitions

Before proceeding, we define some important terms. We define artifact as the interactive system being de signed. An artifact may be a commercially released product or any end-result of a design activity such as a concept system developed for research purposes. We define prototype as any representation of a design idea, regardless of medium. This includes a pre-existing ob ject when used to answer a design question. We define designer as anyone who creates a prototype in order to design, regardless of job title.

Figure . 1

rience to be simulated or actually created; role requires the context of the artifact's use to be established. Being explicit about what design questions must be answered is therefore an essential aid to deciding what kind of prototype to build. The model helps visualize the focus of exploration.

The model shown in Figure I represents a three-dimen sional space which corresponds to important aspects of the design of an interactive artifact. We define the di mensions of the model as role; look and feel; and im plementation. Each dimension corresponds to a class of questions which are salient to the design of any inter active system. "Role" refers to questions about the function that an artifact serves in a user' s l ife--the way in which is it useful to them. "Look and feel" denotes two. questions about the concrete sensory experience of using

an artifactmwhat the user looks at, feels, and hears while using it. "Implementation" refers to questions about the techniques and components through which an artifact performs its functionmthe "nuts and bolts" of how it actually works. The triangle is drawn askew to emphasize that no one dimension is inherently more important than any other.

Goal of the Model: Given a design problem (of any scope or size), designers can use the model to separate design issues into three classes of questions which fre quently demand different approaches to prototyping. Implementation usually requires a working system to be built; look and feel requires the concrete user expe

Markers: A prototype may explore questions or design options in one, two or all three dimensions of the model. In this chapter, several prototypes from real design projects are presented as examples. Their rela tionship to the model is represented by a marker on the triangle. This is a simple way to put the purpose of any prototype in context for the designer and their audi ences. It gives a global sense of what the prototype is intended to explore; and equally important, what it does not explore.

It may be noted that the triangle is a relative and subjective representation.

A location toward one cor ner of the triangle implies simply that in the designer's

own judgment,

questions represented by that comer than to the other

16.3.3 Three Prototypes of One System

The model is best explained further through an example from a real project. The three prototypes shown in Ex amples I-3 were created during the early stages of de velopment of a 3D space-planning application (Houde, 1992).

The goal of the project was to design an example of a 3D application which would be accessible to a broad range of non-technical users.

As such it was de signed to work on a personal computer with an ordi nary mouse. Many prototypes were created by different members of the multi-disciplinary design team during the project.

Figure . 2

Example I. Role prototypef or 3D space-planning applica

tion [El: Houde 1990].

The prototype shown in Example 1 was built to show how a user might select furniture from an on-line catalog and try it out in an approximation of their own room. It is an interactive slide show which the designer

operated by clicking on key areas of terface. The

the rough user in

idea that virtual space-planning would be a helpful task for non-technical users came from user studies. The purpose of the prototype was to quickly convey the proposed role of the artifact to the design team and members of the supporting organization.

Since the purpose of the prototype was primarily to explore and visualize an example of the role of the fu ture artifact, its marker appears very near the role cor ner of the model in Figure 2. It is placed a little toward the look and feel corner because it also explored user interface elements in a very initial form.

One of the challenges of the project was to define an easy-to-use direct manipulation user interface for moving 3D objects with an ordinary 2D mouse cursor. User testing with a foam-core model showed that the

Example 2. Look andf eel prototypef or 3D space-planning

application [E2: Houde 1990].

most important manipulations of a space-planning task were sliding, lifting, and turning furniture objects. Ex ample 2 shows a picture of a prototype which was made to test a user interface featuring this constrained set of manipulations. Clicking once on the chair caused its bounding box to appear. This "handle box" offered hand-shaped controls for lifting and turning the box and object chair (as if the chair was frozen inside the box). Clicking and dragging anywhere on the box al lowed the unit to slide on a 3D floor. The prototype was built using Macromedia Director (a high level animation and scripting tool). It was made to work only with the chair data shown: a set of images pre-drawn for many angles of rotation.

The purpose of the Example 2 prototype was to get feedback from users as quickly as possible as to whether the look and feel of the handle-box user inter face was promising. Users of the prototype were given tasks which encouraged them to move the chair around a virtual room. Some exploration of role was supported by the fact that the object manipulated was a chair, and space-planning tasks were given during the test. Al though the prototype was interactive, the programming that made it so did not seriously explore how a final artifact with this interface might be implemented. It was only done in service of the look and feel test. Since the designer primarily explored the look and feel of the user interface, this prototype's marker is placed very near the look and feel comer of the model in Fig ure 2.

A technical challenge of the project was figuring out how to render 3D graphics quickly enough on equipment that end-users might have.

At the time, it was not clear how much real-time 3D interaction could be achieved on the Apple Macintosh TM II fx computer m the fastest Macintosh then available. Example 3

Example 3. Implementation prototypesf or 3D space planning application [E3: Chen 1990].

shows a prototype which was built primarily to explore rendering capability and performance. This was a working prototype in which multiple 3D objects could be manipulated as in Example 2, and the view of the room could be changed to any perspective. Example 3 was made in a programming environment that best supported the display of true 3D perspectives during manipulation. It was used by the design team to deter mine what complexity of 3D scenes was reasonable to design for. The user interface elements shown on the left side of the screen were made by the programmer to give himself controls for demonstrating the system: they were not made to explore the look and feel of the future artifact.

Thus the primary purpose of the proto type was to explore how the artifact might be imple mented. The marker for this example is placed near the implementation comer (Figure 2).

One might assume that the role prototype (Example 1) was developed first, then the look and feel prototype (Example 2), and finally the implementation prototype (Example 3): that is, in order of increasing detail and production difficulty. In fact, these three prototypes were developed almost in parallel. They were built by differ ent design team members during the early stages of the project. No single prototype could have represented the design of the future artifact at that time. The evolving design was too fuzzy---existing mainly as a shared con cept in the minds of the designers. There were also too many open and interdependent questions in every design dimension: role, look and feel, implementation.

Making separate prototypes enabled specific de sign questions to be addressed with as much clarity as possible. The solutions found became inputs to an inte grated design. Answers to the rendering capability questions addressed by Example 3 informed the design

Figure . 3

model.

of the role that the

artifact could play (guiding how - many furniture objects of what complexity could be shown). It also provided guiding constraints for the di rect manipulation user interface determining how much detail the handle forms could have). Similarly, issues of role addressed by Example 1 informed the implementa tion problem by constraining it: only a constrained set of manipulations was needed for a space-planning applica tion. It also simplified the direct manipulation user inter face by limiting the necessary actions, and therefore controls, which needed to be provided.

It was more efficient to wait on the results of inde pendent investigations in the key areas of role, look and feel and implementation than to try to build a monolithic prototype that integrated all features from the start. Af ter sufficient investigation in separate prototypes, the prototype in Example 3 began to evolve into an inte grated prototype which could be described by a position at the center of our model.

A version of the user inter face developed in Example 2 was implemented in the prototype in Example 3. Results of other prototypes were also integrated. This enabled a more complete user test of features and user interface to take place.

This set of three prototypes from the same project shows how a design problem can be simultaneously approached from multiple points of view. Design questions of role, look and feel, and implementation were explored concurrently by the team with the three

The purpose of the model is to separate prototypes.

make it easier to develop and subsequently communi cate about this kind of prototyping strategy.

16.4 Further Examples

In this section we present twelve more examples of

Figure . 4

to the model

7)

prototypes taken from real projects, and discuss them in terms of the model. Examples are divided into four categories which correspond to the four main regions of the model, as indicated in Figure 3. The first three categories correspond to prototypes with a strong bias to-ward one of the three comers: role, look and feel, and implementation prototypes, respectively. Integra tion prototypes occupy the middle of the model: they explore a balance of questions in all three dimensions.

16.4.1 Role Prototypes

Role prototypes are those which are built primarily to investigate questions of what an artifact could do for a user. They describe the functionality that a user might benefit from, with little attention to how the artifact would look and feel, or how it could be made to actually work. Designers find such prototypes useful to show their design teams what the target role of the artifact might be; to communicate that role to their supporting organization; and to evaluate the role in user studies.

A Portable Notebook Computer: The paper storyboard shown in Example 4 was an early prototype of a port able notebook computer for.students which would ac cept both pen and finger input. The scenario shows a student making notes, annotating a paper, and marking pages for later review in a computer notebook. The de signer presented the storyboard to her design team to fo cus discussion on the issues of what functionality the notebook should provide and how it might be controlled through pen and finger interaction. In terms of the model, this prototype primarily explored the role of the notebook by presenting a rough task scenario for it. A secondary consideration was a rough approximation of

Example 4. Storyboardf or a portable notebook computer

[E4: Vertelney 1990].

the user interface. Its marker, shown in Figure 4, is therefore positioned near the role comer of the model and a little toward look and feel.

Storyboards like this one are considered to be ef fective design tools by many designers because they help focus design discussion on the role of an artifact very early on. However, giving them status as proto types is not common because the medium is paper and thus seems very far from the medium of an interactive computer system. We consider this storyboard to be a prototype because it makes a concrete representation of a design idea and serves the purpose of asking and an swering design questions. Of course, if the designer needed to evaluate a user's reaction to seeing the note book or to using the pen-and-finger interaction, it would be necessary to build a prototype which sup ported direct interaction. However, it might be wasteful to do so before considering design options in the faster, lighter-weight medium of pencil and paper.

An Operating System User Interface: Example 5

shows a screen view of a prototype that explore the design of a new operating system. The prototype was an interactive story: it could only be executed through a single, ordered, sequence of inter actions. Clicking with a cursor on the mailbox picture

was used to

opened a mail window; then clicking on the voice tool

brought up a picture of some sound tools; and

To demonstrate the prototype, the designer sat in a computer and play-acted the role of a user opening her mail, replying to it, and so forth. The prototype was used in design team discussions and also demonstrated to project managers to explain the current design direction. According to the model, this prototype primarily ex plored the role that certain features of the operating

front of

Example 5. Interactive storyf or an operating system inter

face [E5: Vertelney and Wong 1990].

system could play in a user's daily tasks. It was also used to outline very roughly how its features would be portrayed and how a user would interact with it. As in the previous example, the system's implementation was not explored. Its marker is shown in Figure 4.

To make the prototype, user interface elements were hand-drawn and scanned in. Transitions between steps in the scenario were made interactive in Mac romedia Director. This kind of portrayal of on-screen interface elements as rough and hand-drawn was used in order to focus design discussion on the overall fea tures of a design rather than on specific details of look and feel or implementation (Wong, 1992). Ironically, while the design team understood the meaning of the hand-drawn graphics, other members of the organiza tion became enamored with the sketchy style to the extent that they considered using it in the final artifact. This result was entirely at odds with the original rea sons for making a rough-looking prototype. This ex ample shows how the effectiveness of some kinds of prototypes may be limited to a specific kind of audi ence.

The Knowledge Navigator: Example 6 shows a scene from Apple Computer's Knowledge Navigator TM video. The video tape tells a day-in-the-life story of a professor using a futuristic notebook computer (Dubberly and Mitch, 1987). An intelligent agent named "Phil" acts as his virtual personal assistant, finding information related to a lecture, reminding him of his mother' s birthday, and connecting him with other professors via video-link. The professor interacts with Phil by talking, and Phil appar ently recognizes everything said as well as a human as sistant would.

Based on the model, the Knowledge Navigator is identified primarily as a prototype which describes the role that the notebook would play in such a user's life.

Example 6. Knowledge Navigator TM vision videof or af u ture notebook computer [E6: Dubberly and Mitch 1987].

The story is told in great detail, and it is clear that many decisions were made about what to emphasize in the role. The video also shows specific details of ap pearance, interaction, and performance. However, they were not intended by the designers to be prototypes of look and feel. They were merely place-holders for the actual design work which would be necessary to make the product really work. Thus its marker goes directly on the role corner (Figure 4).

Thanks to the video's special effects, the scenario of the professor interacting with the notebook and his assistant looks like a demonstration of a real product. Why did Apple make a highly produced prototype when the previous examples show that a rapid paper storyboard or a sketchy interactive prototype were suf ficient for designing a role and telling a usage story? The answer lies in the kind of audience. The tape was shown publicly and to Apple employees as a vision of the future of computing. Thus the audience of the Knowledge Navigator was very broad--including al most anyone in the world. Each of the two previous role design prototypes was shown to an audience which was well informed about the design project. A rough hand-drawn prototype would not have made the idea seem real to the broad audience the video addressed: high resolution was necessary to help people con cretely visualize the design. Again, while team mem bers learn to interpret abstract kinds of prototypes ac curately, less expert audiences cannot normally be ex pected to understand such approximate representations.

The Integrated Communicator: Example 7 shows an appearance model of an Integrated Communicator cre ated for customer research into alternate presentations of new technology (ID Magazine 1995). It was one of three presentations of possible mechanical configura tions and interaction designs, each built to the same

Example 7. Appearance modelf or the integrated commu nicator [E7: Udagawa 1995].

high finish and accompanied by a video describing on screen interactions. In the study, the value of each presentation was evaluated relative to the others, as perceived by study subjects during one-on-one inter views. The prototype was used to help subjects imag ine such a product in the store and in their homes or offices, and thus to evaluate whether they would pur chase such a product, how much they would expect it to cost, what features they would expect, etc.

The prototype primarily addresses the role of the product, by presenting carefully designed cues which imply a telephone-like role and look-and-feel. Figure 4 shows its marker near the role comer of the model. As with the Knowledge Navigator, the very high resolution look and feel was a means of making the design as concrete as possible to a broad audience. In this case however it also enabled a basic interaction design strategy to be worked out and demonstrated. The prototype did not address implementation.

The key feature of this kind of prototype is that it is a concrete and direct representation, as visually fin ished as actual consumer products.

These attributes encourage an uncoached person to directly relate the design to their own environment, and to the products they own or see in stores.

High quality appearance models are costly to build.

There are two common rea sons for investing in one: to get a visceral response by making the design seem "real" to any audience (design team, organization, and potential users); and to verify the intended look and feel of the artifact before com mitting to production tooling.

An interesting side effect of this prototype was that its directness made it a powerful prop for promoting the project within the or ganization.

16.4.2 Look and Feel Prototypes

Look and feel prototypes are built primarily to explore

Figure . 5

(Examples 8-10) to the model

and demonstrate options for the concrete experience of an artifact. They simulate what it would be like to look at and interact with, without necessarily investigating the role it would play in the user's life or how it would be made to work. Designers make such prototypes to visualize different look and feel possibilities for them selves and their design teams. They ask users to interact with them to see how the look and feel could be improved. They also use them to give members of their supporting organization a concrete sense of what the future artifact will be like.

A Fashion Design Workspace: The prototype shown in Example 8 was developed to support research into collaboration tools for fashion designers (Hill et al, 1993; Scaife et al, 1994). A twenty-minute animation, it presented the concept design for a system for moni toring garment design work.

It illustrated in consider able detail the translation of a proven paper-based pro cedure into a computer-based system with a visually rich, direct manipulation, user interface. The proto type's main purposes were to confirm to the design team that an engaging and effective look and feel could be designed for this application, and to convince man agers of the possibilities of the project.

It was pre sented to users purely for informal discussion.

This is an example of a look and feel prototype. The virtue of the prototype was that it enabled a novel user interface design to be developed without having first to implement complex underlying technologies. While the role was inherited from existing fashion de sign practice, the prototype also demonstrated new op tions offered by the new computer-based approach. Thus, Figure 5 shows its marker in the look and feel area of the model.

One issue with prototypes like this one is that inexp erienced audiences tend to believe them to be more

Example 8. Animation of the look and feel of a fashion design workspace [E8: Hill 1992].

Example 9. Look andf eel simulation prototypesf or a child's toy [E9: Bellman et al, 19931.

functional than they are just by virtue of being shown on

When this prototype was shown, the a computer screen.

designers found they needed to take great care to explain that the design was not implemented.

A Learning Toy: The "GloBall" project was a concept for a children's toy: a ball that would interact with chil dren who played with it. Two prototypes from the proj ect are shown, disassembled, in Example 9. The design team wanted the ball to speak back to kids when they spoke to it, and to roll towards or away from them in re action to their movements. The two prototypes were built to simulate these functions separately. The ball on the left had a walkie-talkie which was concealed in use. A hidden operator spoke into a linked walkie-talkie to simulate the bali's speech while a young child played with it. Similarly, the ball on the right had a radio controlled car which was concealed in use. A hidden op erator remotely controlled the car, thus causing the ball to roll around in response to the child' s actions.

As indicated by the marker in Figure 5, both pro totypes were used to explore the toy's look and feel from a child's viewpoint, and to a lesser extent to evaluate the

Example 10. Pizza-box prototype of an architect's com puter [El 0: Apple Design Project, 1992].

role that the toy would play. Neither seriously address

these very efficient

ed implementation. The designers of prototypes wanted to know how a

child would respond to a toy that appeared to speak and move of its own free will. They managed to convincingly simulate novel and difficult-to-implement technologies such as speech and automotion, for minimal cost and using readily available components. By using a "man behind the curtain" (or "Wizard of Oz") technique, the designers were able to present the prototypes directly to children and to directly evaluate their effect.

An Architect's Computer: This example concerned the design of a portable computer for architects who need to gather a lot of information during visits to building sites. One of the first questions the designers explored was what form would be appropriate for their users. Without much ado they weighted the pizza box shown in Example 10 to the expected weight of the computer, and gave it to an architect to carry on a site visit. They watched how he carried the box, what else he carried with him, and what tasks he needed to do during the visit. They saw that the rectilinear form and weight were too awkward, given the other materials he carried with him, and this simple insight led them to consider a softer form. As shown by its marker, this is an example of a rough look and feel prototype (Figure 5). Role was also explored in a minor way by seeing the context that the artifact would be used in.

The pizza box was a very efficient prototype. Spending virtually no time building it or considering options, the students got useful feedback on a basic design questionmwhat physical form would be best for the user. From what they learned in their simple field test, they knew immediately that they should try to think beyond standard rectilinear notebook computer forms. They began to consider many different options

Figure . 6

(Examples 11 and 12) to the model

including designing the computer to feel more like a soft shoulder bag.

16.4.3 Implementation Prototypes

Some prototypes are built primarily to answer technical questions about how a future artifact might actually be made to work. They are used to discover methods by which adequate specifications for the final artifact can be achieved--without having to define its look and feel or the role it will play for a user. (Some specifications may be unstated, and may include externally imposed constraints, such as the need to reuse existing compo nents or production machinery.) Designers make im plementation prototypes as experiments for themselves and the design team, to demonstrate to their organiza tion the technical feasibility of the artifact, and to get feedback from users on performance issues.

A Digital Movie Editor: Some years ago it was not clear how much interactivity could be added to digital movies playing on personal computers. Example 11 shows a picture of a prototype that was built to investi

gate solutions to this technical challenge. It was an ap plication, written in the C

programming language to run

on an Apple Macintosh computer. It offered a variety of movie data-processing functionality such as controlling various attributes of movie play. The main goal of the prototype was to allow marking of points in a movie to which scripts (which added interactivity) would be at tached. As indicated by the marker in Figure 6, this was primarily a carefully planned implementation prototype. Many options were evaluated about the best way to im plement its functions. The role that the functions would play was less well defined. The visible look and feel of the prototype was largely incidental: it was created by the designer almost purely to demonstrate the available functionality, and was not intended to be used by others.

Example 11. Working prototype of a digital movie editor [Eli: Degen, 1994].

This prototype received varying responses when demonstrated to a group of designers who were not members of the movie editor design team. When the audience understood that an implementation design was being demonstrated, discussion was focused produc tively. At other times it became focused on problems with the user interface, such as the multiple cascading menus, which were hard to control and visually confus ing. In these cases, discussion was less productive: the incidental user interface got in the way of the intentional implementation.

The project leader shared some reflections after this somewhat frustrating experience. He said that part of his goal in pursuing a working prototype alone was to move the project through an organization that respected this kind of prototype more than "smoke and mirrors" proto types---ones which only simulate functionality. He add ed that one problem might have been that the user inter face was neither good enough nor bad enough to avoid misunderstandings. The edit list, which allowed points to be marked in movies, was a viable look and feel design; while the cascading menus were not. For the audience that the prototype was shown to, it might have been more effective to stress the fact that look and feel were not the focus of the prototype; and perhaps, time permit ting, to have complemented this prototype with a sepa rate look and feel prototype that explained their inten tions in that dimension.

A Fluid Dynamics Simulation System: Example 12

shows a small part of the C++ program listing for a system for simulating gas flows and combustion in car engines, part of an engineering research project (Hill, 1993). One goal of this prototype was to demonstrate the feasibility of object-oriented programming using the C++ language in place of procedural programs written in the older FORTRAN language. Object oriented programming can in theory lead to increased

Example 12. C++p rogram samplef rom af luid dynamics simulation system [El2: Hill, 1993].

software reuse, better reliability and easier mainte nance. Since an engine simulation may take a week to run on the fastest available computers and is extremely memory-intensive, it was important to show that the new approach did not incur excessive performance or memory overheads. The program listing shown was the implementation of the operation to copy one list of numbers to another. When tested, it was shown to be faster than the existing FORTRAN implementation. The prototype was built primarily for the design team' s own use, and eventually used to create a deployable system. The marker in Figure 6 indicates that this pro totype primarily explored implementation. Other kinds of implementation prototypes include demonstrations of new algorithms (e.g., a graphical rendering technique or a new search technology), and trial conversions of existing programs to run in new environments (e.g., converting a program written in the C language to the Java language). Implementation prototypes can be hard to build, and since they actually work, it is common for them to find their way directly into the final system. Two problems arise from this dynamic: firstly, programs developed mainly to demonstrate feasibility may turn out in the long term to be difficult to maintain and de velop; and secondly, their temporary user interfaces may never be properly redesigned before the final sys tem is released. For these reasons it is often desirable to treat even implementation prototypes as disposable, and to migrate successful implementation designs to a new integrated prototype as the project progresses.

16.4.4 Integration Prototypes

Integration prototypes are built to represent the com plete user experience of an artifact. Such prototypes bring together the artifact's intended design in terms of role, look and feel, and implementation. Integrated

Figure . 7

prototypes help designers to balance and resolve con straints arising in different design dimensions; to verify that the design is complete and coherent; and to find synergy in the design of the integration itself. In some cases the integration design may become the unique in novation or feature of the final artifact. Since the user's experience of an artifact ultimately combines all three dimensions of the model, integration prototypes are most able to accurately simulate the final artifact. Since they may need to be as complex as the final artifact, they are the most difficult and time consuming kinds of proto types to build. Designers make integration prototypes to understand the design as a whole, to show their organi zations a close approximation to the final artifact, and to get feedback from users about the overall design.

The Sound Browser: The "SoundBrowser" prototype shown in Example 13 was built as part of a larger project which investigated uses of audio for personal computer users (Degen et al, 1992). The prototype was built in C to run on a Macintosh. It allowed a user to browse digital audio data recorded on a special personal tape recorder equipped with buttons for marking points in the audio. The picture shows the SoundBrowser's visual represen tation of the audio data, showing the markers below the sound display. A variety of functions were provided for reviewing sound, such as high-speed playback and play back of marked segments of audio.

This prototype earns a position right in the center of the model, as shown in Figure 7. All three dimen sions of the model were explored and represented in the prototype. The role of the artifact was well thought out, being driven initially by observations of what us ers currently do to mark and play back audio, and then by iteratively designed scenarios of how it might be done more efficiently if electronic marking and view ing functions were offered. The look and feel of the prototype went through many visual design iterations.

Example 13. Integrated prototype of a sound browser [E13:

Degen, 1993].

The implementation was redesigned several times to meet the performance needs of the desired high-speed playback function.

When the SoundBrowser was near completion it was prepared for a user test.

One of the features which the design team intended to evaluate was the visual representation of the sound in the main window. They wanted to show users several alternatives to understand their preferences. The programmer who built the SoundBrowser had developed most of the alternatives. In order to refine these and explore others, two other team members copied screen-shots from the tool into a

Example 14. Integration prototype of the "Pile" metaphor

for information retrieval [El4: Rose, 1993].

different visual

This was a quick way to try out options, in temporary isolation from

sign in this prototype might have been achieved with virtually no user interface: just text input and output. However, since the prototype was to be shown to a broad audience, an integrated style of prototype was chosen, both to communicate the implementation point and to verify that the piles representation was practi cally feasible. It helped greatly that the artifact's role and look and feel could be directly inherited from pre vious prototypes. Figure 7 shows its marker on the

pixel-painting application, where they experimented model. with modifications.

It was far easier to do this

other aspects of the artifact. in a visual design tool than by programming in C. When finished, the new options were programmed into the integrated prototype.

This example shows the value of using different tools for different kinds of de sign exploration, and how even at the end of a project simple, low-fidelity prototypes might be built to solve specific problems.

The Pile Metaphor: The prototype shown in Example 14 was made as part of the development of the "pile" metaphor--a user interface element for casual organi zation of information (Mander et ai, 1992, Rose et al, 1993). It represented the integration of designs devel oped in several other prototypes which independently explored the look and feel of piles, "content-aware" information retrieval, and the role that piles could play as a part of an operating system.

In the pile metaphor, each electronic document was represented by a small icon or "proxy", several of which were stacked to form a pile. The contents of the pile could be quickly re viewed by moving the arrow cursor over it. While the cursor was over a particular document, the "viewing cone" to the right displayed a short text summary of the document.

This prototype was shown to designers, project managers, and software developers as a proof of con cept of the novel technology. The implementation de

A Garment History Browser: The prototype in Exam ple 15 was a working system which enabled users to enter and retrieve snippets of information about gar ment designs via a visually rich user interface (Hill et al, 1993; Scaife et al, 1994). The picture shows the query tool which was designed to engage fashion de signers and provide memorable visual cues. The proto type was designed for testing in three corporations with a limited set of users' actual data, and presented to us ers in interviews.

It was briefly demonstrated, then us ers were asked to try queries and enter remarks about design issues they were currently aware of. This prototype was the end-result of a progression from an initial focus on role (represented by verbal us age scenarios), followed by rough look and feel proto types and an initial implementation. Along the way various ideas were explored, refined or rejected. The working tool, built in Allegiant SuperCard TM, required two months' intensive work by two designers. In retro spect the designers had mixed feelings about it. It was highly motivating to users to be able to manipulate real user data through a novel user interface, and much was learned about the design. However, the design ers also felt that they had had to invest a large amount of time in making the prototype, yet had only been able to support a very narrow role compared to the breadth shown in the animation shown in Example 8. Many broader design questions remained unanswered.

Example 15. Integrated prototype of a garment history

browser [El5: Hill and Kamlish, 1992].

16.5 Summary

In this chapter, we have proposed a change in the lan guage used by designers to think and talk about proto types of interactive artifacts. Much current terminology

centers on attributes of

prototypes themselves: the tools

can

or how refined-looking or -behaving be used in many different ways,

and resolution can be misleading. We have proposed a shift in attention to focus on questions about the design of the artifact itself:

What role will it play in a users life? How should it look and feel? How should it be implemented? The model that we have introduced can be used by designers to divide any design problem into these three classes of questions, each of which may benefit from a different approach to prototyping.

We have described a variety of prototypes from real projects, and have shown how the model can be used to communicate about their purposes. Several practical suggestions for designers have been raised by the ex amples:

Define "prototype" broadly. Efficient prototypes produce answers to their designers' most important questions in the least amount of time. Sometimes very simple representations make highly effective prototypes: e.g., the pizza-box prototype of an ar chitect's computer [Example 10] and the story board notebook [Example 1]. We define a proto type as any representation of a design ideam regardless of medium; and designers as the people who create them--regardless of their job titles.

BuiM multiple prototypes.

Since interactive artifacts

can be very complex, it may be impossible to create an integrated prototype in the formative stages of a

  1. 3D space-planning (role)
  2. 3D space-planning (look and feel)
  3. 3D space-planning (implementation)
  4. Storyboard for portable notebook computer
  5. Interactive story, operating system user interface
  6. Vision video, notebook computer
  7. Appearance model, integrated communicator
  8. Animation, fashion design workspace
  9. Look and feel simulation, child's toy
  10. Pizza-box, architect's computer
  11. Working prototype, digital movie editor
  12. C++ program listing, fluid dynamics simulation
  13. Integrated prototype, sound browser
  14. Integrated prototype, pile metaphor
  15. Integrated prototype, garment histor browser

Figure . 8

project, as in the 3D space-planning example [Examples 1, 2, and 3]. Choosing the fight focused prototypes to build is an art in itself. Be prepared to throw some prototypes away, and to use different tools for different kinds of prototypes.

9 Know your audience. The necessary resolution and fidelity of a prototype may depend most on the nature of its audience. A rough role prototype such as the interactive storyboard [Example 4] may work well for a design team but not for members of the supporting organization. Broader audiences may require higher resolution representations. Some organizations expect to see certain kinds of prototypes: implementation designs are often expected in engineering depart ments, while look-and-feel and role prototypes may rule in a visual design environment.

clear about what design questions are being ex plored with a given prototype--and what are not. Communicating the specific purposes of a prototype

to its audience is a critical aspect of its use. It is up to the designer to prepare an audience for viewing a prototype. Prototypes themselves do not necessarily communicate their purpose. It is especially impor tant to clarify what is and what is not addressed by a prototype when presenting it to any audience be yond the immediate design team.

[E8] Charles Hill. (1992) 9 Royal College of Art, Lon don. Design team: Gillian Crampton Smith, Eleanor Curtis, Charles Hill, Stephen Kamlish, (all of the RCA), Mike Scaife (Sussex University, UK), and Philip Joe (IDEO, London).

By focusing on the purpose of the prototype--that is, on what it prototypes--we can make better deci sions about the kinds of prototypes to build. With a clear purpose for each prototype, we can better use Inc. prototypes to think and communicate about design.

16.6 Acknowledgments

Special thanks are due to Thomas Erickson for guidance with this chapter, and to our many colleagues whose prototypes we have cited, for their comments on early drafts. We would also like to acknowledge S. Joy Mountford whose leadership of the Human Interface Group at Apple created an atmosphere in which creative prototyping could flourish. Finally, thanks to James Spohrer, Lori Leahy, Dan Russell, and Donald Norman at Apple Research Labs for supporting us in writing this chapter.

16.7 Prototype Credits

We credit here the principal designer and design team of each example prototype shown.

[El] Stephanie Houde [E2] Stephanie Houde[E3] Mi chael Chen (1990) 9 Apple Computer, Inc. Project team: Penny Bauersfeld, Michael Chen,

Lewis Knapp (project leader), Laurie Vertelney and Stephanie Houde.

[E4] Laurie Vertelney. (1990), 9 Apple Computer Inc. Project team: Michael Chen, Thomas Erickson, Frank Leahy, Laurie Vertelney (project leader).

[E5] Laurie Vertelney and Yin Yin Wong. (1990), 9 Apple Computer Inc. Project team:

Richard Man der, Gitta Salomon (project leader), Ian Small, Laurie Vertelney, Yin Yin Wong.

[E6] Dubberly, H. and Mitch, D. (1987) 9 Apple Computer, Inc. The Knowledge Navigator (videotape.)

[E7] Masamichi Udagawa. (1995) 9 Apple Computer Inc. Project team: Charles Hill, Heiko Sacher, Nancy Silver, Masamichi Udagawa.

[E9] Tom Bellman, Byron Long, Abba Lustgarten. (1993) University of Toronto, 1993 Apple Design Proj ect, 9 Apple Computer Inc.

[El0] 1992 Apple Design Project, 9 Apple Computer,

[El 1] Leo Degen (1994) 9 Apple Computer Inc., Proj Leo Degen, Stephanie Houde, Michael Mills ect team: (team leader), David Vronay.

[El2] Charles Hill (1993). Doctoral thesis project, Im perial College of Science, Technology and Medicine, London, UK. Project team: Charles Hill, Henry Weller.

[El3] Leo Degen (1993) 9 Apple Computer Inc., Proj ect team: Leo Degen, Richard Mander, Gitta Salomon (team leader), Yin Yin Wong.

[El4] Daniel Rose. (1993). 9 Apple Computer, Inc. Project team: Penny Bauersfeld, Leo Degen, Stephanie Houde, Richard Mander, Ian Small, Gitta Salomon (team leader),Yin Yin Wong [El5] Charles Hill and Stephen Kamlish. (1992) 9 Royal College of Art, London. Design team: Gillian Crampton Smith, Eleanor Curtis, Charles Hill, Stephen Kamlish, (all of the RCA), and Mike Scaife (Sussex University, UK).

16.8 References

Degen, L., Mander, R., Salomon, G. (1992). Working with Audio: Integrating Personal Tape Recorders and Desktop Computers. Human Factors in Computing

Systems: CHI'92 Conference Proceedings. New York:

ACM, pp. 413-418. Dubberly, H. and Mitch, D. (1987). The Knowledge Navigator. Apple Computer, Inc. videotape.

Ehn, P., Kyng, M., (1991) Cardboard Computers: Mocking-it-up or Hands-on the Future., Design at

Work: Cooperative Design of Computer Systems (ed.

Greenbaum, J., and Kyng, M.). Hillsdale, NJ: Law rence Erlbaum. pp. 169-195. Erickson, T., (1995) Notes on Design Practice: Stories and Prototypes as Catalysts for Communication.

"Envisioning Technology: The Scenario as a Frame workf or the System Development Life Cycle" (ed. Car

roll, J.). Addison-Wesley. Hill, C. (1993) Software Design for Interactive Engi neering Simulation. Doctoral Thesis. Imperial College of Science, Technology and Medicine, University of London.

Hill, C., Crampton Smith, G., Curtis, E., Kamlish, S., (1993) Designing a Visual Database for Fashion De

signers. Human Factors in Computing Systems: INTERCHI'93 Adjunct Proceedings. New York, ACM,

pp. 49-50. Houde, S., (1992). Iterative Design of and Interface for Easy 3-D Direct Manipulation. Human Factors in Computing Systems: CHI'92 Conference Proceedings. New York: ACM, pp. 135-142.

I.D.Magazine, (1995) Apple's Shared Conceptual Model, The International Design Magazine: 41's An nual Design Review, July-August 1995. USA. pp. 206 207

Kim, S. (1990). Interdisciplinary Collaboration. The Art of Human Computer Interface Design (ed. B. Lau rel). Reading, MA: Addison-Wesley. pp.31-44. Mander, R., Salomon, G., Wong, Y.Y. (1992). A 'Pile' Metaphor for Supporting Casual Organization of In formation. Human Factors in Computing Systems: CHI'92 Conference Proceedings. New York: ACM, pp. 627-634.

Rose, D.E., Salomon, G., Wong, Y. (1993). Content Awareness in a File System Interface: Implementing the 'Pile' Meta phor for Organizing Information. Research and Devel

Mander, R.,

Oren, T., Poncele6n, D.B.,

opment in Information Retrieval: SIGIR Conference Proceedings. Pittsburgh, PA: ACM, pp. 260-269.

Scaife, M., Curtis, E., Hill, C. (1994) Interdisciplinary Collaboration: a Case Study of Software Development for Fashion Designers. Interacting with Computers, Vol 6. no.4, pp. 395-410

Schrage, M. (1996). Cultures of Prototyping. Bringing Design to Software (ed. T. Winograd). USA: ACM Press. pp. 191-205.

Wong, Y.Y. (1992). Rough and ready prototypes: Les sons from graphic design. Human Factors in Comput

ing Systems: CHI'92 Conference, Posters and Short

Talks, New York: ACM, pp.83-84.

COGNITIVE SCIENCE 19, 265-288 (1995)

How a Cockpit Remembers Its Speeds

EDWIN HUTCHINS

University of California, San Diego

Cognitive science normally takes the individual agent as its unit of analysis. In many human endeavors, however, the outcomes of interest are not determined entirely by the information processing properties of individuals. Nor can they be inferred from the properties of the individual agents, alone, no matter how detailed the knowledge of the properties of those individuals may be. In commercial aviation, for example, the successful completion of a flight is produced by a system that typically includes two or more pilots interacting with each other and with a suite of technological devices. This article presents a theoretical framework that takes a distributed, socio-technical system rather than an individual mind as its primary unit of analysis. This framework is explicitly cognitive in that it is concerned with how information is represented and how representations are transformed and propagated in the performance of tasks. An analysis of a memory task in the cockpit of a commercial airliner shows how the cognitive properties of such distributed systems can differ radically from the cognitive properties of the individuals who inhabit them.

Thirty years of research in cognitive psychology and other areas of cognitive science have given us powerful models of the information processing properties of individual human agents. The cognitive science approach provides a very useful frame for thinking about thinking. When this frame is applied to the individual human agent, one asks a set of questions about the mental

An initial analysis of speed bugs as cognitive artifacts was completed in November of 1988. Since then, my knowledge of the actual uses of speed bugs and my understanding of their role in cockpit cognition has changed dramatically. Some of the ideas in this paper were presented in a paper titled, “Information Flow in the Cockpit” at the American Institute for Aeronautics and Astronautics symposium, “Challenges in Aviation Human Factors: The National Plan,” in Vienna, Virginia. The current draft also benefited from the comments of members of the Flight Deck Research Group of the Boeing Commercial Airplane Company and from the participants in the second NASA Aviation Safety/Automation program researchers meeting. Thanks to Hank Strub, Christine Halverson, and Everett Palmer for discussions and written comments on earlier versions of this paper.

This research was supported by grant NCC 2-591 from the Ames Research center of the National Aeronautics and Space Administration in the Aviation Safety/Automation Program. Everett Palmer served as Technical Monitor.

Correspondence and requests for reprints should be sent to Edwin Hutchins, Department of Cognitive Science, University of California, San Diego, La Jolla, CA 92093-0515; or e-mail to: hutchins@cogsci.ucsd.edu.

265

266

HUTCHINS

processes that organize the behavior of the individual.1 In particular, one asks how information is represented in the cognitive system and how representations are transformed, combined, and propagated through the system (Simon, 1981). Cognitive science thus concerns itself with the nature of knowledge structures and the processes that operate on them. The properties of these representations inside the system and the processes that operate on representations are assumed to cause or explain the observed performance of the cognitive system as a whole.

1 This notion is widespread in cognitive science. See Simon & Kaplan, 1989. The canonical statement of what is currently accepted as the standard position appears in Newell & Simon, 1972. See also Wickens & Flach, 1988 for a direct application of this perspective to aviation.2 March and Simon staked out this territory with their seminal book. Organizations, in 1958. For a review of conceptions of organizations see Morgan, 1986.

In this paper, I will attempt to show that the classical cognitive science approach can be applied with little modification to a unit of analysis that is larger than a person. One can still ask the same questions of a larger, socio- technical system that one would ask of an individual. That is, we wish to characterize the behavioral properties of the unit of analysis in terms of the structure and the processing of representations that are internal to the system. With the new unit of analysis, many of the representations can be observed directly, so in some respects, this may be a much easier task than trying to determine the processes internal to the individual that account for the individual’s behavior. Posing questions in this way reveals how systems that are larger than an individual may have cognitive properties in their own right that cannot be reduced to the cognitive properties of individual persons (Hutchins, 1995). Many outcomes that concern us on a daily basis are produced by cognitive systems of this sort.

Thinking of organizations as cognitive systems is not new, of course.2 What is new is the examination of the role of the material media in which representations are embodied, and in the physical processes that propagate representations across media. Applying the cognitive science approach to a larger unit of analysis requires attention to the details of these processes as they are enacted in the activities of real persons interacting with real material media. The analysis presented here shows that structure in the environment can provide much more than external memory (Norman, 1993).

I will take the cockpit of a commercial airliner as my unit of analysis and will show how the cockpit system performs the cognitive tasks of computing and remembering a set of correspondences between airspeed and wing configuration. I will not present extended examples from actual observations because I don’t know how to render such observations meaningful for a non-flying audience without swamping the reader in technical detail. Instead, I will present a somewhat stylized account of the use of the small set of tools in the performance of this simple task, which is accomplished every time an airliner makes an approach to landing.

COCKPIT SPEEDS

267

The procedures described below come straight from the pages of a major airline’s operations manual for a midsized jet, the McDonnell Douglas MD-80. Similar procedures exist for every make and model airliner. The explanations of the procedures are informed by my experience as a pilot and as an ethnographer of cockpits. In conducting research on aviation safety during the past 6 years,3 I have made more than 100 flights as an observer member of crews in the cockpits of commerical airliners. These observations spanned a wide range of planes, including old and new technology cockpits, domestic and international (trans-oceanic) operations, and both foreign and US-flag carriers.

‘ This research was performed under a contract from the flight human factors branch of the NASA Ames research center. In addition to my activities as an observer, I hold a commercial pilot certificate with multiengine and instrument airplane ratings. I have completed the transition training course (both ground school and full-flight) for the Boeing 747-400 and the ground schools for the McDonnell Douglas MD-88, and the Airbus A32O. I am grateful to the Boeing Commercial Airplane group, McDonnell Douglas, and America West Airlines for these training opportunities.

APPLYING THE COGNITIVE FRAME TO THE COCKPIT SYSTEM

If we want to explain the information processing properties of individuals, we have no choice but to attempt to infer what is inside the individual’s mind. Cognitive scientists do this by constructing carefully selected contexts for eliciting behavior from which they can attribute internal states to actors. However, if we take the cockpit system as the unit of analysis, we can look inside it and directly observe many of the phenomena of interest. In particular, we can directly observe the many representations that are inside the cockpit system, yet outside the heads of the pilots. We can do a lot of research on the cognitive properties of such a system (i.e., we can give accounts of the system’s behavioral properties in terms of its internal representations), without saying anything about the processes that operate inside individual actors (Hutchins, 1990, 1991, 1995). This suggests that rather than trying to map the findings of cognitive psychological studies of individuals directly onto the individual pilots in the cockpit, we should map the conceptualization of the cognitive system onto a new unit of analysis: the cockpit as a whole.

REMEMBERING SPEEDS

Why Speeds Must be Remembered

For an illustration of the application of the cognitiver science frame to the cockpit system, consider the events having to do with remembering speeds in the cockpit of a midsize civil transport jet (a McDonnell Douglas MD-80) on a typical descent from a cruise altitude above 30,000 feet, followed by an

268

HUTCHINS

instrument landing system (ILS) approach and landing. Virtually all of the practices described in this paper are mandated by federal regulations and airline policy or both. A reader may wonder how many crews do these things. The answer is that nearly all of them do these things on every flight. Exceptions are extremely rare. In all of my observations, never have I seen a crew fail to compute and set the approach speeds. This is known in the aviation world as a “killer” item. It is something that can cause a fatal accident, if missed. Of course, sometimes crews do miss these procedures, and sometimes they make headlines as a result. To understand what the task is and how it is accomplished, one needs to know something about the flight characteristics of commercial jet transports as well as something about the mandated division of labor among members of the crew.

Flaps and Slats

The wings of airliners are designed to enable fast flight, yet performance and safety considerations require airliners to fly relatively slowly just after takeoff and before landing. The wings generate ample lift at high speeds, but the shapes designed for high speed cannot generate enough lift to keep the airplane flying at low speeds. To solve this problem, airplanes are equipped with devices, called slats and flaps,4 that change the shape and area of the wing. Slats and flaps are normally retracted in flight, giving the wing a very clean aerodynamic shape. For slow flight, slats and flaps are extended, enlarging the wing and increasing its coefficient of lift. The positions of the slats and flaps define configurations of the wing. In a “clean” wing configuration, the slats and flaps are entirely retracted. There is a lower limit on the speed at which the airplane can be flown in this configuration. Below this limit, the wing can no longer produce lift. This condition is called a wing stall.’ The stall has an abrupt onset and invariably leads to loss of altitude. Stalls at low altitude are very dangerous. The minimum maneuvering speed for a given configuration and aircraft weight is a speed that guarantees a reasonable margin of safety above the stall speed. Flying slower than this speed is dangerous because the airplane is nearer to a stall. Changing the configuration of the wing by extending the slats and flaps lowers the stall speed of the wing, thus permitting the airplane to fly safely at slower speeds. As the airplane nears the airport, it must slow down to maneuver for landing. To maintain safe flight at slower speeds, the crew must extend the slats and flaps to produce the appropriate wing configurations at the right speeds. The coordination of changing wing configuration with changing speed as the airplane slows down is the first part of the speed memory task.

4 Slats are normally on the leading edge of a wing. Flaps normally on the trailing edge.1 This “stall” has nothing to do with the functioning of the engines. Under the right conditions, any airplane can stall with all engines generating maximum thrust.

COCKPIT SPEEDS

269

The second part concerns remembering the speed at which the landing is to be made.

Vref

Within the range of speeds at which the airplane can be flown in its final flap and slat configuration, which speed is right for landing? There are tradeoffs in the determination of landing speed. High speeds are safe in the air because they provide good control response and large stall margins, but they are dangerous on the ground. Limitations on runway length, energy to be dissipated by braking, and the energy to be dissipated if there is an accident on landing, all suggest that landing speed should be as slow as is feasible. The airplane should be traveling slowly enough that it is ready to quit flying when the wheels touch down, but fast enough that control can be maintained in the approach and that if a landing cannot be made, the airplane has enough kinetic energy to climb away from the ground. This speed is called the reference speed, or Vref. Precise control of speed at the correct value is essential to a safe landing.

The minimum maneuvering speeds for the various wing configurations and the speed for landing (called the reference speed) are tabulated in the FLAP/SLAT CONFIGURATION MIN MAN AND REFERENCE SPEED table (Table 1). If weight were not a factor, there would be only one set of speeds to remember, and the task would be much simpler.

Crew Division of Labor

All modern jet transports have two pilot stations, each equipped with a complete set of flight instrumentation. While the airplane is in the air, one pilot is designated the pilot flying (PF) and other, the pilot not flying (PNF). These roles carry with them particular responsibilities with respect to the conduct of the flight. The pilot flying is concerned primarily with control of the airplane. The PNF communicates with air traffic control (ATC), operates the aircraft systems, accomplishes the checklists required in each plase of flight, and attends to other duties in the cockpit.

THREE DESCRIPTIONS OF MEMORY FOR SPEEDS

With an understanding of the problem and the basics of crew organization, we can now examine the activities in the cockpit that are involved with the generation and maintenance of representations of the maneuvering and reference speeds. I will provide three descriptions of the same activities. The first description is procedural. It is the sort of description that a pilot might provide. The second and third descriptions are cognitive in that they concern representations and processes that transform those representations. The second description treats the representations and processes that are external

270

TABLE 1 The FLAP/SLAT CONFIGURATION MIN MAN AND REFERENCE SPEED Table as it Appears in the MD-80 Operating Manual

GROSS WEIGHT X 1000 POUNDS

86

90

94

98

102

106

110

114

118

122

126

130

134

138

142

146

150

154

158

160

0/RET

Min Man

190

194

199

203

207

211

215

219

223

227

230

234

237

241

244

248

251

255

258

260

0/EXT

Min Man

148

152

155

159

162

165

168

171

174

177

180

183

186

188

191

194

197

199

202

203

11/EXT

Min Man

130

133

136

139

142

145

147

150

153

155

158

160

163

165

167

169

172

174

176

177

15/EXT

Min Man

128

131

134

136

139

142

144

147

149

152

154

157

159

162

164

166

169

171

173

174

28/EXT

Min Man

119

122

124

127

130

132

135

137

139

142

144

146

149

151

153

155

157

159

161

162

40/EXT

Min Man

115

118

120

123

125

128

130

132

135

137

139

141

144

146

148

150

152

154

156

157

28/EXT Vref

111

114

116

118

121

123

125

128

130

132

134

136

138

140

142

144

146

148

150

151

40/EXT

Vref

107

110

112

114

117

119

121

123

126

128

130

132

134

136

138

139

141

143

145

146

COCKPIT SPEEDS

271

to the pilots. It provides the constraints for the final description of the representations and processes that are presummed to be internal to the pilots.

A Procedural Description of Memory for Speeds

Prepare the Landing Data

After initiation of the descent from cruise altitude and before reaching 18,000 feet, the PNF should prepare the landing data. This means computing the correspondences between wing configurations and speeds for the projected landing weight. The actual procedure followed depends on the materials available, company policy, and crew preferences.6 For example, many older cockpits use the table in the operations manual (Table 1) and a hard plastic landing data card on which the arrival weather conditions, go- around thrust settings, landing gross weight, and landing speeds are indicated with a grease pencil. Still others use the table in the operations manual and write the speeds on a piece of paper (flight paperwork, printout of destination weather, and so forth). Crews of airplanes equipped with flight management computer systems can look up the approach speeds on a page display of the computer. The MD-80 uses a booklet of speed cards. The booklet contains a page for each weight interval (usually in 2,000 pound increments) with the appropriate speeds permanently printed on the card (Figure 1).

6 The procedural account given here has been constructed from in-flight observations, and from analyses of video and audio recordings of crews operating in high fidelity simulators of this and other aircraft. The activities described here are documented further in airline operations manuals and training manuals, and in the manufacturer’s operational descriptions. Because these manuals and the documentation provided by the Douglas Aircraft company are considered proprietary, the actual sources will not be identified. Additional information came from other published sources, for example, Webb, 1971; Tenney, 1988 and from interviews with pilots. There are minor variations among the operating procedures of various airline companies, but the procedure described here can be taken as representative of this activity.

The preparation of landing data consists of the following steps:

  1. Determine the gross weight of the airplane and select the appropriate card in the speed card booklet. Airplane gross weight on the MD-80 is continuously computed and displayed on the fuel quantity indicator on the center flight instrument panel (Figure 2).
  2. Post the selected speed card in a prominent position in the cockpit.
  3. Set the speed bugs on both airspeed indicator (ASI) instruments (Figure 3) to match the speeds shown on the speed card.

On the instrument depicted in Figure 3, the airspeed is shown both in knots (the black-tipped dial pointer indicating 245) and Mach (the digital indicator showing 0.735). The striped indicator at 348 knots indicates the maximum permissible indicated air speed (IAS). The four black speed bug pointers on the edge of the dial are external to the instrument

İTİ

HUTCHINS

MANEUVERING

FLAPS/SLATS SPEED

O/RET - 227 O/EXT - 177 11 - 155

15 - 152 28 - 142 40 - 137

VREF 28/EXT -132 40/EXT -128

122,000 LBS

Figure 1. A speed card from an MD-80 speed card booklet.

and are manually set by sliding them to the desired positions. The other speed bug (called the “salmon bug” for its orange color) is internal to the instrument and indicates the speed commanded to the flight director and the autothrottle system (which is shown differing from the indicated airspeed by about 2 knots) or both.

Starting with the bug at 227 knots and moving counterclockwise, the bugs indicate: 227—the minimum maneuvering speed with no flaps or slats extended; 177—minimum maneuvering speed with slats, but no flaps, extended; 152—minimum maneuvering speed with flaps at 15° and slats extended; 128—landing speed with flaps at 40° and slats extended (also called Vref).

The preparation of the landing data is usually performed about 25 to 30 minutes prior to landing. The speed bugs are set at this time because at this point crew workload is relatively light and the aircraft is near enough to the destination to make accurate projections of landing gross weight. Later in the approach, the crew workload increases dramatically.

The Descent

During the descent and the approach, the airplane will be slowed in stages, from cruise speed to final approach speed. Before descending through 10,000

COCKPIT SPEEDS

273

11500 11500

cm

2600

feet MSL (mean sea level), the airplane must slow to a speed at or below 250 KIAS (knots Indicated air speed). This speed restriction exists primarily to give pilots more time to see and avoid other traffic as the big jets descend into the congested airspace of the terminal area, and into the realm of small, slow, light aircraft which mostly stay below 10,000 feet.

At about 7,000 feet AFL (above field level), the crew must begin slowing the airplane to speeds that require slat and flap extension. At this point, they use the previously set external speed bugs on the ASI as indicators of where flap extension configuration changes should be made. Some companies specify crew coordination cross-checking procedures for the initial slat selection. For example, “After initial slat selection (O°/EXT), both pilots will visually verify that the slats have extended to the correct position (slat TAKEOFF light on) before reducing speed below O/RET Min Maneuver speed... ”

Because it is dangerous to fly below the minimum maneuvering speed for any configuration, extending the flaps and slats well before slowing to the minimum maneuvering speed might seem to be a good idea. Doing so both would increase the safety margin on the speeds and would give the pilots a

274

HUTCHINS

wider window of speed (and therefore, of time) for selecting the next flap/slat configuration. Unfortunately, other operational considerations rule this out. As one operations manual puts it, “To minimize the air loads on the flaps/salts, avoid extension and operation near the maximum airspeeds. Extend flaps/slats near the Min Maneuver Speed for the flap/slat configuration.” The extension of the flaps and slats must be coordinated precisely with the changes in airspeed. This makes the accurate memory of the speeds even more important than it would be otherwise.

The crew must continue configuration changes as the airplane is slowed further.

The Final Approach

After intercepting the glide slope and beginning the final approach segment, the crew will perform the final approach checklist. One of the elements on this checklist is the challenge/response pair, “Flight instruments and bugs/ Set and cross-checked.”

The PNF reads the challenge. Both pilots check the approach and landing bug positions on their own ASI against the bug position on the other pilot’s ASI and against the speeds shown on the speed card. Both crew members will confirm verbally that the bug speeds have been set and cross checked. For example, the captain (who sits in the left seat) might say, “Set on the

COCKPIT SPEEDS

275

left and cross-checked”, whereas the first officer would respond, “Set on the right and cross-checked.” A more complete cross-check would include a specification of the actual value, (e.g., “One thirty two and one twenty seven set on the left and cross-checked”).

Once final flaps are set during the final approach segment, the PNF calls out airspeed whenever it varies more than plus or minus 5 knots from approach speed.

A Cognitive Description of Memory for Speeds—Representations and Processes Outside the Pilots

Let us now apply the cognitive science frame to the cockpit as a cognitive system. How are the speeds represented in the cockpit? How are these representations transformed, processed, and coordinated with other representations in the descent, approach, and landing? How does the cockpit system remember the speeds at which it is necessary to change the configuration of the wing in order to maintain safe flight?

The observable representations directly involved in the cockpit processes that coordinate airspeed with flap and slat settings are: the gross weight display (Figure 2), the speed card booklet (Figure 1), the two airspeed indicator instruments with internal and external bugs (Figure 3), the speed select window of the flight guidance control panel, and the speed-related verbal exchanges among the members of the crew. The speed-related verbalizations may appear in the communication of the values from PNF to PF while setting the speed bugs, in the initial slat extenion cross-check, in the subsequent configuration changes, in the cross-check phase of the before-landing checklist performance, in the PNF’s approach progress report at 500 feet AFL, and in any required speed deviation call outs on the final approach segment after the selection of the landing flap setting.

In addition to the directly observable media listed earlier, we may also assume that some sort of representation of the speeds has been created in two media that are not directly observable: the memories of the two pilots, themselves. Later, we will consider in detail the task environment in which these memories may form. For now, let us simply note that these mental memories are additional media in the cockpit system, which may support and retain internal representations of any of the available external representations of the speeds.

276

HUTCHINS

Accessing the Speeds and Setting the Bugs

The speed card booklet is a long-term memory in the cockpit system. It stores a set of correspondences between weights and speeds that are functionally durable, in that they are applicable over the entire operating life of the airplane. The weight/speed correspondences represented in the printed booklet are also physically durable, in that short of destroying the physical medium of the cards, the memory is nonvolatile and cannot be corrupted. This memory is not changed by any crew actions. (It could be misplaced, but there is a backup in the form of the performance tables in the operating manual). The appropriate speeds for the airplane are determined by bringing the representation of the airplane gross weight into coordination with the structure of the speed card booklet. The gross weight is used as a filter on this written memory, making one set of speeds much more accessible than any other. The outcome of the filtering operation is imposed on the physical configuration of the speed card booklet by arranging the booklet such that the currently appropriate speed card is the only one visible. Once performed, the filtering need not be done again during the flight.

The physical configuration of the booklet produced by opening it to the correct page becomes a representation of the cockpit system’s memory for both the projected gross weight and the appropriate speeds. That is, the questions, “Which gross weight did we select?” and “What are the speeds for the selected weight?”, can both be answered by reading the visible speed card. The correspondence of a particular gross weight to a particular set of speeds is built into the physical structure of each card by printing the corresponding weight and speed values on the same card. This is a simple but effective way to produce the computation of the speeds, because selecting the correct weight can’t help but select the correct speeds.

Posting the appropriate speed card where it can be seen easily, by both pilots creates a distribution (across social space) of access to information in the system that may have important consequences for several kinds of subsequent processing. Combined with a distribution of knowledge that results from standardized training and experience, this distribution of access to information supports the development of redundant storage of the information and redundant processing. Also, it creates a new trajectory by which speed-relevant information may reach the PF. Furthermore, posting the speed card provides a temporally enduring resource for checking and cross checking speeds, so that these tasks can be done (or redone) any time. And because the card shows both a set of speeds and the weight for which the speeds are appropriate, it also provides a grounds for checking the posted gross weight against the displayed gross weight on the fuel quantity panel (Figure 2), which is just a few inches above the normal posting position of the speed card. This is very useful in deciding whether the wrong weight, and therefore, the wrong speeds, may have been selected.

COCKPIT SPEEDS

277

7 See Gras et al., 1991 (p. 49ff) for a discussion of the balance among the senses in the modern cockpit. Of course, aural attending may produce an internal representation that endures longer than the spoken words.

The PF may make use of any of the representations the PNF has prepared in order to create a representation of the bug speeds on the PF’s airspeed indicator. The spoken representation and the speed card provide the PF’s easiest access to the values, although it is also possible for the PF to read the PNF’s airspeed indicator. Because all of these representations are available simultaneously, there are multiple opportunities for consistency checks in the system of distributed representation.

When the pilots set the speed bugs, the values that were listed in written form on the speed card, and were represented in spoken form by the PNF, are re-represented as marked positions adjacent to values on the scale of the airspeed indicator (ASI). Because there are two ASI’s, this is a redundant representation in the cockpit system. In addition, it provides a distribution of access to information that will be taken advantage of in later processes.

The external speed bug settings capture a regularity in the environment that is of a shorter time scale than the weight/speed correspondences that are represented in the speed card booklet. The speed bug settings are a memory that is malleable, and that fits a particular period of time (this approach). Because of the location of the ASI and the nature of the bugs, this representation is quite resistant to disruption by other activities.

Using the Configuration Change Bugs

The problem to be solved is the coordination of the wing configuration changes with the changes in airspeed as the airplane slows to maneuver for the approach. The location of the airplane in the approach and or the instructions received from ATC determine the speed to be flown at any point

278

HUTCHINS

in the approach. The cockpit system must somehow construct and maintain an appropriate relationship between airspeed and slat/flap configuration. The information path that leads from indicated airspeed to flap/slat configuration includes several observable representations in addition to the speed bugs.

The airspeed is displayed on the ASI by the position of the airspeed indicator needle. Thus, as the ASI needle nears the speed bug that represents the clean-configuration minimum maneuvering speed, the pilot flying can call for “Flaps 0.” The spoken flap/slat setting name is coordinated with the labels on the flap handle quadrant. That is, the PNF positions the flap handle adjacent to the label that matches (or is equivalent to) the flap/slat setting name called by the PF. Movement of the flap handle then actuates the flaps and slats themselves which produce the appropriate wing configurations for the present speed. The speed bugs contribute to this process by providing the bridge between the indicated airspeed and the name of the appropriate flap/slat configuration for the aircraft at its present gross weight.

The cockpit procedures of some airlines require that the configuration that is produced by the initial extension of slats be verified by both crew members (by reference to an indicator on the flight instrument panel) before slowing below the clean MinMan speed. This verification activity provides a context in which disagreements between the settings of the first speed bug on the two ASIs can be discovered. Also, it may involve a consultation with the speed card by either pilot to check the MinMan speed, or even a comparison of the weight indicated by the selected speed card and the airplane gross weight as displayed on the fuel quantity panel. The fact that these other checks are so easy to perform with the available resources, highlights the fact that the physical configuration of the speed card is both a memory for speed, and a memory for a decision that was made earlier in the flight about the appropriate approach speed. Any of these activities may also refresh either pilot’s internal memory for the speeds or the gross weight. The depth of the processing engaged in here, that is, how many of these other checks are performed, may depend on the time available and the pilots’ sense about whether or not things are going well. Probably, it is not possible to predict how many other checks may be precipitated by this mandated cross check, but it is important to note that several are possible and may occur at this point.

When the pilot flying calls for a configuration change, the PNF can, and should, verify that the speed is appropriate for the commanded configuration change. The mandated division of labor in which the PF calls for the flap setting, and the PNF actually selects it by moving the flap handle, permits the PF to keep hands on the yoke and throttles during the flap extension. This facilitates airplane control because changes in pitch attitude normally occur during flap extension. It is likely that facilitating control was the original justification for this procedure. However, this division of labor

COCKPIT SPEEDS

279

also has a very attractive system-level cognitive side effect in that it provides for additional redundancies in checking the bug settings and for the correspondences between speeds and configuration changes.

Using the Salmon Bug

On the final approach, the salmon bug provides the speed reference for both pilots, as both have speed-related tasks to perform. The spatial relation between the ASI needle and the salmon bug provides the pilots with an indication of how well the airplane is tracking the speed target, and may give indications of the effects on airspeed of pitch changes input by the crew (or other autoflight systems in tracking the glide slope during a coupled approach) or of local weather conditions, such as windshear.

The salmon bug is also the reference which the PNF computes the deviation from target speed. The PNF must make the mandatory call out at 500 feet AFL, as well as any other call outs required if the airspeed deviates by more than five knots from the target approach speed. In these call outs, the trajectory of task-relevant representational state is from the relationship between the ASI needle and the salmon bug, to a verbalization by the PNF directed to the PF. Because the final approach segment is visually intensive for the PF, the conversion of the airspeed information from the visual into the auditory modality by the PNF permits the PF access to this important information, without requiring the allocation of precious visual resources to the ASI.

Summary of Representations and Processes Outside the Pilot

Setting the speed bugs is a matter of producing a representation in the cockpit environment that will serve as a resource that organizes performances that are to come later. This structure is produced by bringing representations into coordination with one another (the gross weight readout, the speed card, the verbalizations, and so forth) and will provide the representational state (relations between speed bug locations and ASI needle positions) that will be coordinated with other representations (names for flap positions, flap handle quadrant labels, flap handle positions, and so forth) ten to fifteen minutes later, when the airplane begins slowing down. I call this entire process a cockpit system’s “memory” because it consists of the creation, inside the system, of a representational state that is then saved and used to organize subsequent activities.

A Cognitive Description of Memory for Speeds—Representations and Processes Inside the Pilots

Having described the directly observable representational states involved in the memory for speeds in the cockpit system during the approach, we ask of that same cycle of activity, “What are the cognitive tasks facing the pilots?”

280

HUTCHINS

The description of transformations of the representational state in the previous section is both a description of how the system processes information and a specification of cognitive tasks facing individual pilots. It is, in fact, a better cognitive task specification than can be had by simply thinking in terms of procedural descriptions. The task specification is detailed enough, in some cases, to put constraints on the kinds of representations and processes that the individuals must use.

In much of the cockpit’s remembering, significant functions are achieved by a person interpreting material symbols, rather than by a person recalling those symbols from his or her memory. So we must go beyond looking for things that resemble our expectations about human memory to understand the phenomena of memory in the cockpit as a cognitive system.

Computing the Speeds and Setting the Bugs

The speeds are computed by pattern matching on the airplane gross weight and the weights provided on the cards. The pilots don’t have to remember what the weights are that appear on the cards. It is necessary only to find the place of the indicated gross weight value in the cards that are provided. However, repeated exposure to the cards may lead to implicit learning of the weight intervals, and whatever such knowledge that does develop may be a resource in selecting the appropriate speed card for any given gross weight. With experience, pilots may develop internal structures to coordinate with predictable structure in the task environment.

Once the appropriate card has been selected, the values must be read from the card. Several design measures have been taken to facilitate this process. Frequently used speeds appear in larger font size than do infrequently used speeds, and there is a box around the Vref speeds to help pilots find these values (Wickens & Flach, 1988). Reading is, probably, an overlearned skill for most pilots. Still, there is a need for working memory: transposition errors are probably the most frequent sort of error committed in this process (Norman, 1991; Wickens & Flach, 1988).

Setting any single speed bug to a particular value requires the pilot to hold the target speed in memory, read speed scale, locate the target speed on the speed scale (a search similar to that for weight in the speed card booklet), and then, manually, move the speed bug to the scale position. Because not all tick marks on the speed scale have printed values adjacent to them, some interpolation, or counting of ticks, also is required.

Coordinating reading the speeds with setting the bugs is more complicated. The actions of reading and setting may be interleaved in many possible orders. One could read each speed before setting it or read several speeds, retain them in memory, then set them one by one. Other sequences are also possible. The demands on working memory will depend on the strategy chosen. If several speeds are to be remembered and then set, they may be

COCKPIT SPEEDS

281

rehearsed to maintain the memory. Such a memory is vulnerable to interference from other tasks in the same modality (Wickens & Flach, 1988), and the breakdown of such a memory may lead to a shift to a strategy that has less demanding memory requirements.

The activities involved in computing the bug speeds and rerepresenting them in several other media may permit them to be represented in a more enduring way in the memory of the PNF. Similarly, hearing the spoken values, possibly reading them from the landing data card, and setting them on the airspeed indicator, may permit a more enduring representation of the values to form in the memory of the PF. Lacking additional evidence, we cannot know the duration or quality of these memories. But we know from observation that there are ample opportunities for rehearsals and associations of the rehearsed values with representations in the environment.

Using the Configuration Change Bugs

The airspeed indicator needle moves counter-clockwise as the airplane slows. Because the airspeed scale represents speed as spatial position and numerical relations as spatial relations, the airspeed bugs segment the face of the ASI into regions that can be occupied by the ASI needle. The relation of the ASI needle to the bug positions is thus constructed as the location of the airplane’s present airspeed in a space of speeds. The bugs are also associated with particular flap/slat setting names (e.g., 0°/RET, 15°/EXT, and so forth), so the regions on the face of the ASI have meaning both as speed regimes and as locations for flap/slat setting names. Once the bugs have been set, the pilots do not simply take in sensory data from the ASI; rather, the pilots impose additional meaningful structure on the image of the ASI. They use the bugs to define regions of the face of the ASI, and they associate particular meanings with those regions (Figure 4). The coordination of speed with wing configuration is achieved by superimposing representations of wing configuration and representation of speed on the same instrument.

Once the bugs are set, it is not necessary actually to read the scale values where they are placed. It is necessary, however, to remember the meanings of each of the bugs with respect to names for flap/slat configurations. Since the regions of speed scale that are associated with each configuration are not permanently marked on the jet ASI, the pilot must construct the meanings of the regions in the act of “seeing” the ASI with bugs as a set of meaningful regions.

Speed bugs are part of what Luria called a functional system (Luria, 1979). It is a constellation of structures, some of them internal to the human actors, some external, involved in the performance of some invariant task. It is commonplace to refer to the speed bug as a memory aid (Norman, 1991; Tenney, 1988). Speed bugs are said to help the pilot remember the critical speeds. But now that we have looked at how speed bugs are set up and how

282

HUTCHINS

O/EXT

Figure 4. Meaningful regions of the airspeed indicator face. The pilots "see" regions of the airspeed indicator scale as having meanings in terms of the configurations required to fly the airplane at the speeds in each region.

they are used, it is not clear that they contribute to the pilot’s memory at all. The functional system of interest here is the one that controls the coordination of airspeeds with wing configurations. It is possible to imagine a functional system without speed bugs, in which pilots are required to read the speeds, remember the speeds, remember which configuration change goes with each speed, read the scale, and so forth. Adding speed bugs to the system does nothing to alter the memory of the pilots, but it does permit a different set of processes to be assembled into a functional system that achieves the same results as the system without speed bugs. In the functional system with speed bugs, some of the memory requirements for the pilot are reduced. What was accomplished without speed bugs by remembering speed values, reading the ASI needle values, and comparing the two values is accomplished with the use of speed bugs by judgments of spatial proximity. Individual pilot memory has not been enhanced; rather, the memory function has now become a property of a larger system in which the individual engages in a different sort of cognitive behavior. The beauty of devices like speed bugs is that they permit these reconfigurations of functional systems in ways that

COCKPIT SPEEDS

283

reduce the requirements for scarce cognitive resources. To call speed bugs a “memory aide” for the pilots is to mistake the cognitive properties of the reorganized functional system for the cognitive properties of one of its human components. Speed bugs do not help pilots remember speeds; rather, they are part of the process by which the cockpit system remembers speeds.

Using the Salmon Bug

Without a speed bug, on final approach the PF must remember the approach speed, read the airspeed indicator scale to find the remembered value of the approach speed on the airspeed indicator scale, and compare the position of the ASI needle on the scale with the position of the approach speed on the scale. With the salmon bug set, the pilot no longer needs to read the airspeed indicator scale. He or she simply looks to see whether or not the indicator needle is lined up with the salmon bug. Thus, a memory and scale reading task is transformed into a judgment of spatial adjacency. It is important to make these tasks as simple as possible because there are many other things the pilot must do on the final approach. The pilot must continue monitoring the airspeed while also monitoring the glide path and runway alignment of the aircraft. Deviations in any of these may require corrective actions.

In making the required speed call outs, the PNF uses the salmon bug in a way similar to the way the PF does. To determine the numerical relation between the indicated speed and the setting of the salmon bug, the PNF could use mental arithmetic and subtract the current speed from the value of Vref. This is the sort of cognitive task we imagine might face the crew if we simply examined the procedural description. A less obvious, but equally effective method, is to use the scale of the ASI as a computational medium. The base of the salmon bug is about ten knots wide in the portion of the speed scale relevant to maneuvering for approach and landing. To determine if the current speed is within 5 knots of the target, one only need see if the airspeed pointer is pointing at any part of the body of the salmon bug. This strategy permits a conceptual task to be implemented by perceptual processes.

Having determined the deviation from target speed, the PNF calls it out to the PF. Notice the role of the representation of information. Twice in this example, a change in the nature of the representation of information results in a change in the nature of the cognitive task facing the pilot. In the first case, the speed bug itself permits a simple judgment of spatial proximity to be substituted for a scale reading task operation. In the second case, the PNF further transforms the task facing the PF from a judgment of spatial proximity (requiring scarce visual resources) into a task of monitoring a particular aural cue (a phrase like, “five knots fast”). Notice also that the change in the task for the pilot flying changes the kinds of internal knowledge structures that must be brought into play in order to decide on an appropriate action.

284

HUTCHINS

The Pilot’s Memory for Speeds

Memory is normally thought of as a psychological function internal to the individual. However, memory tasks in the cockpit may be accomplished by functional systems which transcend the boundaries of the individual actor. Memory processses may be distributed among human agents, or between human agents and external representational devices.

COCKPIT SPEEDS

285

be caused by a wider bug never arises), and provides a bit of structure in the world for the pilots that can be opportunistically exploited to solve an operational problem that the designers never anticipated.

COGNITIVE PROPERTIES OF THE COCKPIT SYSTEM

The task is to control the configuration of the airplane to match the changes in speed required for maneuvering in the approach and landing. The flaps are controlled by positioning the flap handle. The flap handle is controlled by aligning it with written labels for flap positions that correspond to spoken labels produced by the PF. The spoken labels are produced at the appropriate times by speaking the name of the region on the ASI face that the needle is approaching. The regions of the ASI are delimited by the settings of the speed bugs. The names of the regions are produced by the PF through the application of a schema for seeing the dial face. The speed bugs are positioned by placing them in accordance with the speeds listed on the selected speed card. And the speed card is selected by matching the weight printed on the bottom with the weight displayed on the fuel quantity panel.

This system makes use of representations in many different media. The media themselves have very different properties. The speed card booklet is a relatively permanent representation. The spoken representation is ephemeral and endures only in its production. The memory is stored ultimately for use in the physical state of the speed bugs. It is represented temporarily in the spoken interchanges, and represented with unknown persistence in the memories of the individual pilots. The pilot’s memories clearly are involved, but they operate in an environment where there is a great deal of support for recreating the memory.

Speed bugs are involved in a distribution of cognitive labor across social space. The speed bug helps the solo pilot by simplifying the task of determining the relation of present airspeed to Vref, thereby reducing the amount of time required for the pilot’s eyes to be on the airspeed indicator during the approach. With multi-pilot crews, the cognitive work of reading the airspeed indicator and monitoring the other instruments on the final approach can be divided among the pilots. The PF can dedicate visual resources to monitoring the progress of the aircraft, whereas the pilot not flying can use visual resources to monitor airspeed and transform the representation of the relation between current airspeed and Vref from a visual to an auditory form.

Speed bugs permit a shift in the distribution of cognitive effort across time. They enable the crew to calculate correspondences between speeds and configurations during a low workload phase of flight, and save the results of that computation for later use. Internal memory also supports this redistribution of effort across time, but notice the different properties of the two kinds of representation; a properly set speed bug is much less likely than a pilot’s memory to “forget” its value. The robustness of the physical device

286

HUTCHINS

as a representation permits the computation of speeds to be moved arbitrarily far in time from the moment of their use and is relatively insensitive to the interruptions, the distractions, and the delays that may disrupt internal memories.

This is a surprisingly redundant system. Not only is there redundant representation in memory; there is also redundant processing and redundant checking. The interaction of the representations in the different media gives the overall system the properties it has. This is not to say that knowing about the people is not important, but rather to say that much of what we cart about is in the interaction of the people with each other and with physical structure in the environment.

The analog ASI display maps an abstract conceptual quantity, speed, onto an expanse of physical space. This mapping of conceptual structure onto physical space allows important conceptual operations to be defined in terms of simple perceptual procedures. Simple internal structures (the meanings of the regions on the dial face defined by the positions of the speed bugs) in interaction with simple and specialized external representations perform powerful computations.

DISCUSSION

The cockpit system remembers its speeds, and the memory process emerges from the activity of the pilots. The memory of the cockpit, however, is not made primarily of pilot memory. A complete theory of individual human memory would not be sufficient to understand that which we wish to understand because so much of the memory function takes place outside the individual. In some sense, what the theory of individual human memory explains is not how this system works, but why this system must contain so many components that are functionally implicated in cockpit memory, yet are external to the pilots themselves.

The speed bug is one of many devices in the cockpit that participate in functional systems which accomplish memory tasks. The altitude alerting system and the many pieces of paper that appear in even the most modern glass cockpit are other examples. The properties of functional systems that are mediated by external representations differ from those that rely exclusively on internal representations, and may depend on the physical properties of the external representational media. Such factors as the endurance of a representation, the sensory modality via which it is accessed, its vulnerability to disruption, and the competition for modality specific resources may all influence the cognitive properties of such a system.

This article presents a theoretical framework that takes a socio-technical system, rather than an individual mind, as its primary unit of analysis. This theory is explicitly cognitive in the sense that it is concerned with how infor

COCKPIT SPEEDS

287

mation is represented and how representations are transformed and propagated through the system. Such a theory can provide a bridge between the information processing properties of individuals and the information processing properties of a larger system, such as an airplane cockpit.

One of the primary jobs of a theory is to help us look in the right places for answers to questions. This system-level cognitive view directs our attention beyond the cognitive properties of individuals to the properties of external representations and to the interactions between internal and external representations. Technological devices introduced into the cockpit invariably affect the flow of information in the cockpit. They may determine the possible trajectories of information or the kinds of transformations of information structure that are required for propagation. Given the current rapid pace of introduction of computational equipment, these issues are becoming increasingly important.

REFERENCES

Gras, A., Moricot, C., Poirot-Delpech, S., & Scardigli, V. (1991). Le Pilote, le contrôleur, et l'automate (Réédition du rapport prédéfinition PIRTTEM—CNRS et du rapport final SERT—Ministere des transports, ed.). Paris: Editions de L’Iris.

Hutchins, E. (1990). The technology of team navigation. In J. Galegher, R. Kraut, & C. Egido (Eds.), Intellectual teamwork: Social and technical bases of collaborative work. Hillsdale, NJ: Erlbaum.

Hutchins, E. (1991). Organizing work by adaptation. Organization Science, 2, 14-38. Hutchins, E. (1995). Cognition in the wild. Cambridge, MA: MIT Press.

Luria, A.R. (1979). The making of mind: A personal account of Soviet psychology. (M. Cole

& S. Cole, Trans.). Cambridge, MA: Harvard University Press.

March, J., & Simon, H. (1958). Organizations. New York: Wiley.

Morgan, G. (1986). Images of organization. Beverly Hills, CA: Sage.

Newell, A., & Simon, H.A. (1972). Human problem solving. Englewood Cliffs, NJ: Prentice Hall. Norman, D.A. (1991). Cognitive science in the cockpit. CESERIAC Gateway, 2, 1-6.

Norman, D.A. (1993). Things that make us smart. Reading, MA: Addison Wesley.

Simon, H.A. (1981). The sciences of the artificial (2nd ed.). Cambridge, MA: MIT Press.

Simon, H.A., & Kaplan, C.A. (1989). Foundations of Cognitive Science. In M. Posner (Eds.), Foundations of cognitive science. Cambridge, MA: MIT Press.

Tenney, D.C. (1988, December). Bug speeds pinpointed by autothrottles mean less jockeying but more thinking. Professional pilot, pp. 96-99.

Webb, J. (1971). Fly the wing. Ames, IA: Iowa State University Press.

Wickens, C., & Flach, J. (1988). Information processing. In E. Wiener & D. Nagel (Eds.), Human factors in aviation. New York: Academic.

GLOSSARY

ASI Air speed indicator.

ATC Air traffic control.

Flap A panel mounted on the trailing edge of the wing that

can be extended to change the shape of the wing and increase its area.

288

HUTCHINS

IAS Indicated air speed. The airspeed determined by the dynamic pressure of the airstream over the airplane. This may be different from true airspeed. It is the speed that is indicated on the ASI.

MinMan speed The minimum maneuvering speed. A speed at which an

PF

airplane has a reasonable margin over a stall given the current configuration. This is usually 1.3 times the stall speed for the configuration.

Pilot flying. The crewmember who is responsible for

PNF

flying and navigating the airplane.

Pilot not flying. The crewmember who is responsible for

Slat

communicating with ATC and operating the airplane non-flying systems, airconditioning and pressurization, for example.

A panel mounted on the leading edge of the wing that can be extended to change the shape of the wing and increase its area. Slats are normally extended before flaps.

Vret The approach reference speed or velocity. This is the target speed for the final approach segment.

4

Studying Context: A Comparison of Activity Theory, Situated Action Models, and Distributed Cognition

Bonnie A. Nardi

It has been recognized that system design will benefit from explicit study of the context in which users work. The unaided individual divorced from a social group and from supporting artifacts is no longer the model user. But with this realization about the importance of context come many difficult questions. What exactly is context? If the individual is no longer central, what is the correct unit of analysis? What are the relations between artifacts, individuals, and the social groups to which they belong? This chapter compares three approaches to the study of context: activity theory, situated action models, and distributed cognition. I consider the basic concepts each approach promulgates and evaluate the usefulness of each for the design of technology. 1

A broad range of work in psychology (Leont'ev 1978; Vygotsky 1978; Luria 1979; Scribner 1984; Newman, Griffin, and Cole 1989; Norman 1991; Salomon 1993), anthropology (Lave 1988; Suchman 1987; Flor and Hutchins 1991; Hutchins 1991a; Nardi and Miller 1990, 1991; Gantt and Nardi 1992; Chaiklin and Lave 1993), and computer science (Clement 1990; Mackay 1990; MacLean et al. 1990) has shown that it is not possible to fully understand how people learn or work if the unit of study is the unaided individual with no access to other people or to artifacts for accomplishing the task at hand. Thus we are motivated to study context to understand relations among individuals, artifacts, and social groups. But as human-computer interaction researchers, how can we conduct studies of context that will have value to designers who seek our expertise?

Brooks (1991) argues that HCI specialists will be most valuable to designers when we can provide (1) a broad background of comparative understanding over many domains, (2) high-level analyses useful for evaluating the impact of major design decisions, and (3) information that suggests actual designs rather than simply general design guidelines or metrics for evaluation. To be able to provide such expertise, we must develop an appropriate analytical abstraction that ``discards irrelevant details while isolating and

emphasizing those properties of artifacts and situations that are most significant for design'' (Brooks, 1991,

emphasis added). It is especially difficult to isolate and emphasize critical properties of artifacts and situations in studies that consider a full context because the scope of analysis has been widened to accommodate such holistic breadth. Taking context seriously means finding oneself in the thick of the complexities of particular situations at particular times with particular individuals. Finding commonalities across situations is difficult because studies may go off in so many different directions, making it problematic to provide the comparative understanding across domains that Brooks (1991) advocates. How can we confront the blooming, buzzing confusion that is ``context'' and still produce generalizable research results?

This chapter looks at three approaches to the study of context—activity theory, situated action

models, and the distributed cognition approach—to see what tools each offers to help manage the study of context. In particular we look at the unit of analysis proposed by each approach, the categories offered to support a description of context, the extent to which each treats action as structured prior to or during activity, and the stance toward the conceptual equivalence of people and things.

Activity theory, situated action models, and distributed cognition are evolving frameworks and

will change and grow as each is exercised with empirical study. In this chapter I ask where each approach seems to be headed and what its emphases and perspectives are. A brief overview of each approach to studying context will be given, followed by a discussion of some critical differences among the approaches. An argument is made for the advantages of activity theory as an overall framework while at the same time recognizing the value of situated action models and distributed cognition analyses.

SITUATED ACTION MODELS

Situated action models emphasize the emergent, contingent nature of human activity, the way activity grows directly out of the particularities of a given situation. 2 The focus of study is situated activity or practice, as opposed to the study of the formal or cognitive properties of artifacts, or structured social relations, or enduring cultural knowledge and values. Situated action analysts do not deny that artifacts or social relations or knowledge or values are important, but they argue that the true locus of inquiry should be the ``everyday activity of persons acting in [a] setting'' (Lave 1988).3 That this inquiry is meant to take place at a very fine-grained level of minutely observed activities, inextricably embedded in a particular situation, is reflected in Suchman's (1987) statement that ``the organization of situated action is an emergent property of moment-by-moment interactions between actors, and between actors and the environments of their action.''

Lave (1988) identifies the basic unit of analysis for situated action as ``the activity of persons

acting in setting.'' The unit of analysis is thus not the individual, not the environment, but a relation between the two. A setting is defined as ``a relation between acting persons and the arenas in relation with which they act.'' An arena is a stable institutional framework. For example, a supermarket is an arena within which activity takes place. For the individual who shops in the supermarket, the supermarket is experienced as a setting because it is a ``personally ordered, edited version'' of the institution of the supermarket. In other words, each shopper shops only for certain items in certain aisles, depending on her needs and habits. She has thus ``edited'' the institution to match her personal preferences (Lave 1988).

An important aspect of the ``activity of persons-acting in setting'' as a unit of analysis is that it

forces the analyst to pay attention to the flux of ongoing activity, to focus on the unfolding of real activity in a real setting. Situated action emphasizes responsiveness to the environment and the improvisatory nature of human activity (Lave 1988). By way of illustrating such improvisation, Lave's (1988) ``cottage cheese'' story has become something of a classic. A participant in the Weight Watchers program had the task of fixing a serving of cottage cheese that was to be three-quarters of the two-thirds cup of cottage cheese the program normally allotted. 4 To find the correct amount of cottage cheese, the dieter, after puzzling over the problem a bit, ``filled a measuring cup two-thirds full of cheese, dumped it out on a cutting board, patted it into a circle, marked a cross on it, scooped away one quadrant, and served the rest'' (Lave 1988).

In emphasizing improvisation and response to contingency, situated action deemphasizes study of

more durable, stable phenomena that persist across situations. The cottage cheese story is telling: it is a one-time solution to a one-time problem, involving a personal improvisation that starts and stops with the dieter himself. It does not in any serious way involve the enduring social organization of Weight Watchers or an analysis of the design of an artifact such as the measuring cup. It is a highly particularistic accounting of a single episode that highlights an individual's creative response to a unique situation.

Empirical accounts in studies of situated action tend to have this flavor. Lave (1988) provides

detailed descriptions of grocery store activity such as putting apples into bags, finding enchiladas in the frozen food section, and ascertaining whether packages of cheese are mispriced. Suchman (1987) gives a detailed description of experiments in which novices tried to figure out how to use the double-sided copy function of a copier. Suchman and Trigg (1991) describe the particulars of an incident of the use of a baggage- and passenger-handling form by airport personnel. These analyses offer intricately detailed observations of the temporal sequencing of a particular train of events rather than being descriptive of enduring patterns of behavior across situations.

A central tenet of the situated action approach is that the structuring of activity is not something that precedes it but can only grow directly out of the immediacy of the situation (Suchman 1987; Lave 1988). The insistence on the exigencies of particular situations and the emergent, contingent character of action is a reaction to years of influential work in artificial intelligence and cognitive science in which ``problem solving'' was seen as a ``series of objective, rational pre-specified means to ends'' (Lave 1988) and work that overemphasized the importance of plans in shaping behavior (Suchman 1987). Such work failed to recognize the opportunistic, flexible way that people engage in real activity. It failed to treat the environment as an important shaper of activity, concentrating almost exclusively on representations in the head—usually rigid, planful ones—as the object of study.

Situated action models provide a useful corrective to these restrictive notions that put research into

something of a cognitive straitjacket. Once one looks at real behavior in real situations, it becomes clear that rigid mental representations such as formulaic plans or simplistically conceived ``rational problem solving'' cannot account for real human activity. Both Suchman (1987) and Lave (1988) provide excellent critiques of the shortcomings of the traditional cognitive science approach.

ACTIVITY THEORY

Of the approaches examined in this chapter, activity theory is the oldest and most developed, stretching back to work begun in the former Soviet Union in the 1920s. Activity theory is complex and I will highlight only certain aspects here. (For summaries see Leont'ev 1974; Bødker 1989; and Kuutti 1991; for more extensive treatment see Leont'ev 1978; Wertsch 1981; Davydov, Zinchenko, and Talyzina 1982; and Raeithel 1991.) This discussion will focus on a core set of concepts from activity theory that are fundamental for studies of technology.

In activity theory the unit of analysis is an activity. Leont'ev, one of the chief architects of activity

theory, describes an activity as being composed of subject, object, actions, and operations (1974). A subject is a person or a group engaged in an activity. An object (in the sense of ``objective'') is held by the subject and motivates activity, giving it a specific direction. ``Behind the object,'' he writes, ``there always stands a need or a desire, to which [the activity] always answers.'' Christiansen (this volume) uses the term ``objectified motive,'' which I find a congenial mnemonic for a word with as many meanings in English as ``object.'' One might also think of the ``object of the game'' or an ``object lesson.''

Actions are goal-directed processes that must be undertaken to fulfill the object. They are conscious (because one holds a goal in mind), and different actions may be undertaken to meet the same goal. For example,

a person may have the object of obtaining food, but to do so he must carry out actions not immediately directed at obtaining food.... His goal may be to make a hunting weapon. Does he subsequently use the weapon he made, or does he pass it on to someone else and receive a portion of the total catch? In both cases, that which energizes his activity and that to which his action is directed do not coincide (Leont'ev 1974).

Christiansen (this volume) provides a nice example of an object from her research on the design of the

information systems used by Danish police: ``[The detective] expressed as a vision for [the] design [of his software system] that it should be strong enough to handle a `Palme case,' referring to the largest homicide investigation known in Scandinavia, when the Swedish prime minister Oluf Palme was shot down on a street in Stockholm in 1986!'' This example illustrates Raeithel and Velichkovsky's depiction of objects as

actively ``held in the line of sight.'' ... the bull's eye of the archer's target, which is the original meaning of the German word Zweck (``purpose''), for example, is a symbol of any future state where a real arrow hits near it. Taking it into sight, as the desired ``end'' of the whole enterprise, literally causes this result by way of the archer's action-coupling to the physical processes that let the arrow fly and make it stop again (Raeithel and Velichkovsky, this volume).

Thus, a system that can handle a ``Palme case'' is a kind of bull's eye that channels and directs the detective's actions as he designs the sofware system that he envisions.

Objects can be transformed in the course of an activity; they are not immutable structures. As

Kuutti (this volume) notes, ``It is possible that an object itself will undergo changes during the process of an activity.'' Christiansen (this volume) and Engeström and Escalante (this volume) provide case studies of this process. Objects do not, however, change on a moment-by-moment basis. There is some stability over time, and changes in objects are not trivial; they can change the nature of an activity fundamentally (see, for example, Holland and Reeves, this volume).

Actions are similar to what are often referred to in the HCI literature as tasks (e.g., Norman 1991). Activities may overlap in that different subjects engaged together in a set of coordinated actions may have multiple or conflicting objects (Kuutti 1991).

Actions also have operational aspects, that is, the way the action is actually carried out. Operations become routinized and unconscious with practice. When learning to drive a car, the shifting of the gears is an action with an explicit goal that must be consciously attended to. Later, shifting gears becomes operational and ``can no longer be picked out as a special goal-directed process: its goal is not picked out and discerned by the driver; and for the driver, gear shifting psychologically ceases to exist'' (Leont'ev 1974). Operations depend on the conditions under which the action is being carried out. If a goal remains the same while the conditions under which it is to be carried out change, then ``only the operational structure of the action will be changed'' (Leont'ev 1974).

Activity theory holds that the constituents of activity are not fixed but can dynamically change as

conditions change. All levels can move both up and down (Leont'ev 1974). As we saw with gear shifting, actions become operations as the driver habituates to them. An operation can become an action when ``conditions impede an action's execution through previously formed operations'' (Leont'ev 1974). For example, if one's mail program ceases to work, one continues to send mail by substituting another mailer, but it is now necessary to pay conscious attention to using an unfamiliar set of commands. Notice that here the object remains fixed, but goals, actions, and operations change as conditions change. As Bødker (1989) points out, the flexibility recognized by activity theory is an important distinction between activity theory and other frameworks such as GOMS. Activity theory ``does not predict or describe each step in the activity of the user (as opposed to the approach of Card, Moran and Newell, 1983)'' as Bødker (1989) says, because activity theory recognizes that changing conditions can realign the constituents of an activity.

A key idea in activity theory is the notion of mediation by artifacts (Kuutti 1991). Artifacts,

broadly defined to include instruments, signs, language, and machines, mediate activity and are created by people to control their own behavior. Artifacts carry with them a particular culture and history (Kuutti 1991) and are persistent structures that stretch across activities through time and space. As Kaptelinin (chapter 3, this volume) points out, recognizing the central role of mediation in human thought and behavior may lead us to reframe the object of our work as ``computer-mediated activity,'' in which the starring role goes to the activity itself rather than as ``human-computer interaction'' in which the relationship between the user and a machine is the focal point of interest.

Activity theory, then, proposes a very specific notion of context: the activity itself is the context.

What takes place in an activity system composed of object, actions, and operation, is the context. Context is constituted through the enactment of an activity involving people and artifacts. Context is not an outer container or shell inside of which people behave in certain ways. People consciously and deliberately generate contexts (activities) in part through their own objects; hence context is not just ``out there.''

Context is both internal to people—involving specific objects and goals—and, at the same time,

external to people, involving artifacts, other people, specific settings. The crucial point is that in activity theory, external and internal are fused, unified. In Zinchenko's discussion of functional organs (this volume) the unity of external and internal is explored (see also Kaptelinin, this volume, chapters 3 and 5). Zinchenko's example of the relationship between Rostropovich and his cello (they are inextricably implicated in one another) invalidates simplistic explanations that divide internal and external and schemes that see context as external to people. People transform themselves profoundly through the acquisition of functional organs; context cannot be conceived as simply a set of external ``resources'' lying about. One's ability—and choice—to marshall and use resources is, rather, the result of specific historical and developmental processes in which a person is changed. A context cannot be reduced to an enumeration of people and artifacts; rather the specific transformative relationship between people and artifacts, embodied in the activity theory notion of functional organ, is at the heart of any definition of context, or activity.

DISTRIBUTED COGNITION

The distributed cognition approach (which its practitioners refer to simply as distributed cognition, a convention I shall adopt here)

is a new branch of cognitive science devoted to the study of: the representation of knowledge both inside the heads of individuals and in the world ...; the propagation of knowledge between different individuals and artifacts ...; and the transformations which external structures undergo when operated on by individuals and artifacts.... By studying cognitive phenomena in this fashion it is hoped that an understanding of how intelligence is manifested at the systems level, as opposed to the individual cognitive level, will be obtained. (Flor and Hutchins 1991)

Distributed cognition asserts as a unit of analysis a cognitive system composed of individuals and the artifacts they use (Flor and Hutchins 1991; Hutchins 1991a). The cognitive system is something like what activity theorists would call an activity; for example, Hutchins (1991a) describes the activity of flying a plane, focusing on ``the cockpit system.'' Systems have goals; in the cockpit, for example, the goal is the ``successful completion of a flight.''5 Because the system is not relative to an individual but to a distributed collection of interacting people and artifacts, we cannot understand how a system achieves its goal by understanding ``the properties of individual agents alone, no matter how detailed the knowledge of the properties of those individuals might be'' (Hutchins 1991a). The cockpit, with its pilots and instruments forming a single cognitive system, can be understood only when we understand, as a unity, the contributions of the individual agents in the system and the coordination necessary among the agents to enact the goal, that is, to achieve ``the successful completion of a flight.'' (Hutchins 1994 studies shipboard navigation and makes similar points.)

Thus distributed cognition moves the unit of analysis to the system and finds its center of gravity in the functioning of the system, much as classic systems theory did (Weiner 1948; Ashby 1956; Bertalanffy 1968). While a distributed cognition analyst would probably, if pushed, locate system goals in the minds of the people who are part of the system, the intent is to redirect analysis to the systems level to reveal the functioning of the system itself rather than the individuals who are part of the system. Practitioners of distributed cognition sometimes refer to the ``functional system'' (instead of the ``cognitive system'') as their central unit of analysis (Hutchins 1994; Rogers and Ellis 1994), hinting at an even further distance from the notion of the individual that the term cognitive cannot help but suggest.

Distributed cognition is concerned with structure—representations inside and outside the head—and

the transformations these structures undergo. This is very much in line with traditional cognitive science (Newell and Simon 1972) but with the difference that cooperating people and artifacts are the focus of interest, not just individual cognition ``in the head.'' Because of the focus on representations—both internal to an individual and those created and displayed in artifacts—an important emphasis is on the study of such representations. Distributed cognition tends to provide finely detailed analyses of particular artifacts (Norman 1988; Norman and Hutchins 1988; Nardi and Miller 1990; Zhang 1990; Hutchins 1991a, Nardi et al. 1993) and to be concerned with finding stable design principles that are widely applicable across design problems (Norman 1988, 1991; Nardi and Zarmer 1993).

The other major emphasis of distributed cognition is on understanding the coordination among individuals and artifacts, that is, to understand how individual agents align and share within a distributed process (Flor and Hutchins 1991; Hutchins 1991a, 1991b; Nardi and Miller 1991). For example, Flor and Hutchins (1991) studied how two programmers performing a software maintenance task coordinated the task among themselves. Nardi and Miller (1991) studied the spreadsheet as a coordinating device facilitating the distribution and exchange of domain knowledge within an organization. In these analyses, shared goals and plans, and the particular characteristics of the artifacts in the system, are important determinants of the interactions and the quality of collaboration.

DIFFERENCES BETWEEN ACTIVITY THEORY, SITUATED ACTION MODELS, AND DISTRIBUTED COGNITION

All three frameworks for analyzing context that we have considered are valuable in underscoring the need to look at real activity in real situations and in squarely facing the conflux of multifaceted, shifting, intertwining processes that comprise human thought and behavior. The differences in the frameworks should also be considered as we try to find a set of concepts with which to confront the problem of context in HCI studies.

The Structuring of Activity

An important difference between activity theory and distributed cognition, on the one hand, and situated action, on the other hand, is the treatment of motive and goals. In activity theory, activity is shaped first and foremost by an object held by the subject; in fact, we are able to distinguish one activity from another only by virtue of their differing objects (Leont'ev 1974; Kozulin 1986; Kuutti 1991, this volume). Activity theory emphasizes motivation and purposefulness and is ``optimistic concerning human self-determination'' (Engeström 1990). A distributed cognition analysis begins with the positing of a system goal, which is similar to the activity theory notion of object, except that a system goal is an abstract systemic concept that does not involve individual consciousness.

Attention to the shaping force of goals in activity theory and distributed cogntion, be they

conscious human motives or systemic goals, contrasts with the contingent, responsive, improvisatory emphasis of situated action. In situated action, one activity cannot be distinguished from another by reference to an object (motive); in fact Lave (1988) argues that ``goals [are not] a condition for action.... An analytic focus on direct experience in the lived-in world leads to ... the proposition that goals are constructed, often in verbal interpretation'' (emphasis in original). In other words, goals are our musings out loud about why we did something after we have done it; goals are ``retrospective and reflexive'' (Lave 1988).

In a similar vein, Suchman (1987), following Garfinkel (1967), asserts that ``a statement of intent

generally says very little about the action that follows.'' If we appear to have plans to carry out our intent, it is because plans are ``an artifact of our reasoning about action, not ... the generative mechanism of action.'' (emphasis in original). Suchman (1987) says that plans are ``retrospective reconstructions.'' 6 The position adopted by Lave (1988) and Suchman (1987) concerning goals and plans is that they are post hoc rationalizations for actions whose meaning can arise only within the immediacy of a given situation.

Lave (1988) asks the obvious question about this problematic view of intentionality: ``If the

meaning of activity is constructed in action ... from whence comes its intentional character, and indeed its meaningful basis?'' Her answer, that ``activity and its values are generated simultaneously,'' restates her position but does not explicate it. Winograd and Flores (1986) also subscribe to this radically situated view, using the colorful term ``throwness'' (after Heidegger) to argue that we are actively embedded, or ``thrown into,'' in an ongoing situation that directs the flow of our actions much more than reflection or the use of durable mental representations.

In activity theory and distributed cognition, by contract, an object-goal is the beginning point of

analysis. An object precedes and motivates activity. As Leont'ev (1974) states, ``Performing operations that do not realize any kind of goal-directed action [and recursively, a motive] on the subject's part is like the operation of a machine that has escaped human control.''

In activity theory and distributed cognition, an object is (partially) determinative of activity; in

situated action, every activity is by definition uniquely constituted by the confluence of the particular factors that come together to form one ``situation.'' In a sense, situated action models are confined to what activity theorists would call the action and operation levels (though lacking a notion of goal at the action level in the activity theory sense). Situated action concentrates, at these levels, on the way people orient to changing conditions. Suchman's (1987) notion of ``embodied skills'' is similar to the notion of operations, though less rich than the activity theory construct which grounds operations in consciousness and specifies that operations are dependent on certain conditions obtaining and that they may dynamically transform into actions when conditions change.

While in principle one could reasonably focus one's efforts on understanding the action and

operation levels while acknowledging the importance of the object level, neither Lave (1988) nor Suchman (1987), as we have seen, does this. On the contrary, the very idea of an object's generating activity is rejected; objects (goals) and plans are ``retrospective reconstructions,'' post hoc ``artifacts of reasoning about action,'' after action has taken place. Why people would construct such explanations is an interesting question not addressed in these accounts. And why other people would demand or believe such retrospective reconstructions is another question to be addressed by this line of reasoning.

Situated action models have a slightly behavioristic undercurrent in that it is the subject's reaction to the environment (the ``situation'') that finally determines action. What the analyst observes is cast as a response (the subject's actions/operations) to a stimulus (the ``situation''). The mediating influences of goals, plans, objects, and mental representations that would order the perception of a situation are absent in the situated view. There is no attempt to catalog and predict invariant reactions (as in classical behaviorism) as situations are said to vary unpredictably, but the relation between actor and environment is one of reaction in this logic. 7 People ``orient to a situation'' rather than proactively generating activity rich with meaning reflective of their interests, intentions, and prior knowledge.

Suchman and Trigg (1991) cataloged their research methods in describing how they conduct empirical studies. What is left out is as interesting as what is included. The authors report that they use (1) a stationary video camera to record behavior and conversation; (2) ``shadowing'' or following around an individual to study his or her movements; (3) tracing of artifacts and instrumenting of computers to audit usage, and (4) event-based analysis tracking individual tasks at different locations in a given setting. Absent from this catalog is the use of interviewing; interviews are treated as more or less unreliable accounts of idealized or rationalized behavior, such as subjectively reported goals as ``verbal interpretation'' (Lave 1988) and plans as ``retrospective reconstructions'' (Suchman 1987). Situated action analyses rely on recordable, observable behavior that is ``logged'' through analysis of a videotape or other record (Suchman and Trigg 1993; Jordan and Henderson 1994).8 Accounts from study participants describing in their own words what they think are doing, and why, such as those in this book by Bellamy, Bødker, Christiansen, Engeström and Escalante, Holland and Reeves, and Nardi, are not a focal point of situated action analyses.

Activity theory has something interesting to tell us about the value of interview data. It has

become a kind of received wisdom in the HCI community that people cannot articulate what they are doing (a notion sometimes used as a justification for observational studies and sometimes used to avoid talking to users at all). This generalization is true, however, primarily at the level of operations; it is certainly very difficult to say how you type, or how you see the winning pattern on the chessboard, or how you know when you have written a sentence that communicates well. But this generalization does not apply to the higher conscious levels of actions and objects; ask a secretary what the current problems are with the boss, or an effective executive what his goals are for the next quarter, and you will get an earful!

Skillful interviewing or the need to teach someone how to do something often bring operations to

the subject's conscious awareness so that even operations can be talked about, at least to some degree. Dancers, for example, use imagery and other verbal techniques to teach dance skills that are extremely difficult to verbalize. The ability to bring operations to a conscious level, even if only partially, is an aspect of the dynamism of the levels of activity as posited by activity theory. When the subject is motivated (e.g., by wishing to cooperate with a researcher or by the desire to teach), at least some operational material can be retrieved (see Bødker, this volume). The conditions fostering such a dynamic move to the action level of awareness may include skillful probing by an interviewer.

In situated action, what constitutes a situation is defined by the researcher; there is no definitive

concept such as object that marks a situation. The Leont'evian notion of object and goals remaining constant while actions and operations change because of changing conditions is not possible in the situated action framework that identifies the genesis of action as an indivisible conjunction of particularities giving rise to a unique situation. Thus we find a major difference between activity theory and situated action; in the former, the structuring of activity is determined in part, and in important ways, by human intentionality before the unfolding in a particular situation; in situated action, activity can be known only as it plays out in situ. In situated action, goals and plans cannot even be realized until after the activity has taken place, at which time they become constructed rationalizations for activity that is wholly created in the crucible of a particular situation. In terms of identifying activity, activity theory provides the more satisfying option of taking a definition of an activity directly from a subjectively defined object rather than imposing a definition from the researcher's view.

These divergent notions of the structuring of activity, and the conceptual tools that identify one

activity distinctly from another, are important for comparative work in studies of human-computer interaction. A framework that provides a clear way to demarcate one activity from another provides more comparative power than one that does not. Analyses that are entirely self-contained, in the way that a truly situated description of activity is, provide little scope for comparison. The level of analysis of situated action models—at the moment-by-moment level—would seem to be too low for comparative work. Brooks (1991) criticizes human-factors task analysis as being too low level in that all components in an analysis must ``be specified as at atomic a level as possible.'' This leads to an ad hoc set of tasks relevant only to a particular domain and makes cross-task comparison difficult (Brooks 1991). A similar criticism applies to situated action models in which a focus on moment-by-moment actions leads to detailed descriptions of highly particularistic activities (such as pricing cheeses in a bin or measuring out cottage cheese) that are not likely to be replicated across contexts. Most crucially, no tools for pulling out a higher-level description from a set of observations are offered, as they are in activity theory.

Persistent Structures

An important question for the study of context is the role that persistent structures such as artifacts, institutions, and cultural values play in shaping activity. To what extent should we expend effort analyzing the durable structures that stretch across situations and activities that cannot be properly described as simply an aspect of a particular situation?

For both activity theory and distributed cognition, persistent structures are a central focus. Activity

theory is concerned with the historical development of activity and the mediating role of artifacts. Leont'ev

  1. (following work by Vygotsky) considered the use of tools to be crucial: ``A tool mediates activity that connects a person not only with the world of objects, but also with other people. This means that a person's activity assimilates the experience of humanity.'' Distributed cognition offers a similar notion; for example, Hutchins (1987) discusses ``collaborative manipulation,'' the process by which we take advantage
  2. f artifacts designed by others, sharing good ideas across time and space. Hutchins's example is a navigator using a map: the cartographer who created the map contributes, every time the navigator uses the map, to a remote collaboration in the navigator's task.

Situated action models less readily accommodate durable structures that persist over time and

across different activities. To the extent that activity is truly seen as ``situated,'' persistent, durable structures that span situations, and can thus be described and analyzed independent of a particular situation, will not be central. It is likely, however, that situated action models, especially those concerned with the design of technology, will allow some latitude in the degree of adherence to a purist view of situatedness, to allow for the study of cognitive and structural properties of artifacts and practices as they span situations. Indeed, in recent articles we find discussion of ``routine practices'' (Suchman and Trigg 1991) and ``routine competencies'' (Suchman 1993) to account for the observed regularities in the work settings studied. The studies continue to report detailed episodic events rich in minute particulars, but weave in descriptions of routine behavior as well.

Situated action accounts may then exhibit a tension between an emphasis on that which is

emergent, contingent, improvisatory and that which is routine and predictable. It remains to be seen just how this tension resolves—whether an actual synthesis emerges (more than simple acknowledgment that both improvisations and routines can be found in human behavior) or whether the claims to true situatedness that form the basis of the critique of cognitive science cede some importance to representations ``in the head.'' The appearance of routines in situated action models opens a chink in the situated armor with respect to mental representations; routines must be known and represented somehow. Routines still circumambulate notions of planful, intentional behavior; being canned bits of behavior, they obviate the need for active, conscious planning or the formulation of deliberate intentions or choices. Thus the positing of routines in situated action models departs from notions of emergent, contingent behavior but is consistent in staying clear of plans and motives.

Of the three frameworks, distributed cognition has taken most seriously the study of persistent

structures, especially artifacts. The emphasis on representations and the transformations they undergo brings persistent structures to center stage. Distributed cognition studies provide in-depth analyses of artifacts such as nomograms (Norman and Hutchins 1988), navigational tools (Hutchins 1990), airplane cockpits (Hutchins 1991a), spreadsheets (Nardi and Miller 1990, 1991), computer-aided design (CAD) systems (Petre and Green 1992), and even everyday artifacts such as door handles (Norman 1988). In these analyses, the artifacts are studied as they are actually used in real situations, but the properties of the artifacts are seen as persisting across situations of use, and it is believed that artifacts can be designed or redesigned with respect to their intrinsic structure as well as with respect to specific situations of use. For example, a spreadsheet table is an intrinsically good design (from a perceptual standpoint) for a system in which a great deal of dense information must be displayed and manipulated in a small space (Nardi and Miller 1990). Hutchins's (1991a) analysis of cockpit devices considers the memory requirements they impose. Norman (1988) analyzes whether artifacts are designed to prevent users from doing unintended (and unwanted) things with them. Petre and Green (1992) establish requirements for graphical notations for computer-aided design (CAD) users based on users' cognitive capabilities. In these studies, an understanding of artifacts is animated by observations made in real situations of their use, but there is also important consideration given to the relatively stable cognitive and structural properties of the artifacts that are not bound to a particular situation of use.

Distributed cognition has also been productive of analyses of work practices that span specific

situational contexts. For example, Seifert and Hutchins (1988) studied cooperative error correction on board large ships, finding that virtually all navigational errors were collaboratively ``detected and corrected within the navigation team.'' Gantt and Nardi (1992) found that organizations that make intensive use of CAD software may create formal in-house support systems for CAD users composed of domain experts (such as drafters) who also enjoy working with computers. Rogers and Ellis (1994) studied computer mediated work in engineering practice. Symon et al. (1993) analyzed the coordination of work in a radiology department in a large hospital. Nardi et al. (1993) studied the coordination of work during neurosurgery afforded by video located within the operating room and at remote locations in the hospital. A series of studies on end user computing have found a strong pattern of cooperative work among users of a variety of software systems in very different arenas, including users of word processing programs (Clement 1990), spreadsheets (Nardi and Miller 1990, 1991), UNIX (Mackay 1990), a scripting language (MacLean et al. 1990), and CAD systems (Gantt and Nardi 1992).

In these studies the work practices described are not best analyzed as a product of a specific

situation but are important precisely because they span particular situations. These studies develop points at a high level of analysis; for example, simply discovering that application development is a collaborative process has profound implications for the design of computer systems (Mackay 1990; Nardi 1993). Moment-by-moment actions, which would make generalization across contexts difficult, are not the key focus of these studies, which look for broader patterns spanning individual situations.

People and Things: Symmetrical or Asymmetrical?

Kaptelinin (chapter 5, this volume) points out that activity theory differs fundamentally from cognitive science in rejecting the idea that computers and people are equivalent. In cognitive science, a tight information processing loop with inputs and outputs on both sides models cognition. It is not important whether the agents in the model are humans or things produced by humans (such as computers). (See also Bødker, this volume, on the tool perspective.)

Activity theory, with its emphasis on the importance of motive and consciousness—which belong

only to humans—sees artifacts and people as different. Artifacts are mediators of human thought and behavior; people and things are not equivalent. Bødker (this volume) defines artifacts as instruments in the service of activities. In activity theory, people and things are unambiguously asymmetrical.

Distributed cognition, by contrast, views people and things as conceptually equivalent; people and

artifacts are ``agents'' in a system. This is similar to traditional cognitive science, except that the scope of the system has been widened to include a collaborating set of artifacts and people rather than the narrow ``man-machine'' dyad of cognitive science.

While treating each node in a system as an ``agent'' has a certain elegance, it leads to a problematic view of cognition. We find in distributed cognition the somewhat illogical notion that artifacts are cognizing entities. Flor and Hutchins (1991) speak of ``the propagation of knowledge between different individuals and artifacts.'' But an artifact cannot know anything; it serves as a medium of knowledge for a human. A human may act on a piece of knowledge in unpredictable, self-initiated ways, according to socially or personally defined motives. A machine's use of information is always programmatic. Thus a theory that posits equivalence between human and machine damps out sources of systemic variation and contradiction (in the activity theory sense; see Kuutti, this volume) that may have important ramifications for a system. The activity theory notion of artifacts as mediators of cognition seems a more reasoned way to discuss relations between artifacts and people.

Activity theory instructs us to treat people as sentient, moral beings (Tikhomirov 1972), a stance

not required in relation to a machine and often treated as optional with respect to people when they are viewed simply as nodes in a system. The activity theory position would seem to hold greater potential for leading to a more responsible technology design in which people are viewed as active beings in control of their tools for creative purposes rather than as automatons whose operations are to be automated away, or nodes whose rights to privacy and dignity are not guaranteed. Engeström and Escalante (this volume) apply the activity theory approach of asymmetrical human-thing relations to their critique of actor-network theory.

In an analysis of the role of Fitts's law in HCI studies undertaken from an activity theory perspective, Bertelsen (1994) argues that Fitts's ``law'' is actually an effect, subject to contextual variations, and throws into question the whole notion of the person as merely a predictable mechanical ``channel.'' Bertelsen notes that ``no matter how much it is claimed that Fitts' Law is merely a useful metaphor, it will make us perceive the human being as a channel. The danger is that viewing the human being as a channel will make us treat her as a mechanical device.... Our implicit or explicit choice of world view is also a choice of the world we want to live in; disinterested sciences do not exist'' (Bertelsen 1994). Seeing Fitts's findings as an effect, subject to contextual influence, helps us to avoid the depiction of the user as a mechanical part.

Activity theory says, in essence, that we are what we do. Bertelsen sees Fitts's law as a tool of a

particular kind of science that ``reduces the design of work environments, e.g., computer artifacts, to a matter of economical optimization.'' If we wish to design in such a manner, we will create a world of ruthless optimization and little else, but it is certainly not inevitable that we do so. However, no amount of evidence that people are capable of behaving opportunistically, contingently, and flexibly will inhibit the development and dispersal of oppressive technologies; Taylorization has made that clear. If we wish a different world, it is necessary to design humane and liberating technologies that create the world as we wish it to be.

There are never cut-and-dried answers, of course, when dealing with broad philosophical problems such as the definition of people and things, but activity theory at least engages the issue by maintaining that there is a difference and asking us to study its implications. Many years ago, Tikhomirov (1972) wrote, ``How society formulates the problem of advancing the creative content of its citizens' labor is a necessary condition for the full use of the computer's possibilities.''

Situated action models portray humans and things as qualitatively different. Suchman (1987) has

been particularly eloquent on this point. But as I have noted, situated action models, perhaps inadvertently, may present people as reactive ciphers rather than fully cognizant human actors with self-generated agendas.

DECIDING AMONG THE THREE APPROACHES

All three approaches to the study of context have merit. The situated action perspective has provided a much-needed corrective to the rationalistic accounts of human behavior from traditional cognitive science. It exhorts us not to depend on rigidly conceived notions of inflexible plans and goals and invites us to take careful notice of what people are actually doing in the flux of real activity. Distributed cognition has shown how detailed analyses that combine the formal and cognitive properties of artifacts with observations on how artifacts are used can lead to understandings useful for design. Distributed cognition studies have also begun to generate a body of comparative data on patterns of work practices in varying arenas.

Activity theory and distributed cognition are very close in spirit, as we have seen, and it is my

belief that the two approaches will mutually inform, and even merge, over time, though activity theory will continue to probe questions of consciousness outside the purview of distributed cognition as it is presently formulated. The main differences with which we should be concerned here are between activity theory and situated action. Activity theory seems to me to be considerably richer and deeper than the situated action perspective. 9 Although the critique of cognitive science offered by situated action analysts is cogent and has been extremely beneficial, the insistence on the ``situation'' as the primary determinant of activity is, in the long run, unsatisfying. What is a ``situation''? How do we account for variable responses to the same environment or ``situation'' without recourse to notions of object and consciousness?

To take a very simple example, let us consider three individuals, each going on a nature walk. The first walker, a bird watcher, looks for birds. The second, an entomologist, studies insects as he walks, and the third, a meteorologist, gazes at clouds. The walker will carry out specific actions, such as using binoculars, or turning over leaves, or looking skyward, depending on his or her interest. The ``situation'' is the same in each case; what differs is the subject's object. While we might define a situation to include some notion of the subject's intentions, as we have seen, this approach is explicitly rejected by situated action analysts. (See also Lave 1993.)

To take the example a step further, we observe that the bird watcher and the meteorologist might in some cases take exactly the same action from a behavioral point of view, such as looking skyward. But the observable action actually involves two very different activities for the subjects themselves. One is studying cloud formations, the other watching migrating ducks. The action of each, as seen on a videotape, for example, would appear identical; what differs is the subject's intent, interest, and knowledge of what is being looked at.

If we do not consider the subject's object, we cannot account for simple things such as, in the case of the bird watcher, the presence of a field guide to birds and perhaps a ``life list'' that she marks up as she walks along. 10 A bird watcher may go to great lengths to spot a tiny flycatcher high in the top of a tree; another walker will be totally unaware of the presence of the bird. The conscious actions and attention of the walker thus derive from her object. The bird watcher may also have an even longer-term object in mind as she goes along: adding all the North American birds to her life list. This object, very important to her, is in no way knowable from ``the situation'' (and not observable from a videotape). Activity theory gives us a vocabulary for talking about the walker's activity in meaningful subjective terms and gives the necessary attention to what the subject brings to a situation. 11 In significant measure, the walker construes and creates the situation by virtue of prior interest and knowledge. She is constrained by the environment in important ways, but her actions are not determined by it. As Davydov, Zinchenko, and Talyzina (1982) put it, the subject actively ```meets' the object with partiality and selectivity,'' rather than being ``totally subordinate to the effects of environmental factors ... the principle of reactivity is counterposed to the principle of the subject's activeness.''

It is also important to remember that the walker has consciously chosen an object and taken the

necessary actions for carrying it out; she did not just suddenly and unexpectedly end up in the woods. Can we really say, as Suchman (1987) does, that her actions are ``ad hoc''? Situated action analyses often assume a ``situation'' that one somehow finds oneself in, without consideration of the fact that the very ``situation'' has already been created in part by the subject's desire to carry out some activity. For example, Suchman's famous canoeing example, intended to show that in the thick of things one abandons plans, is set up so that the ``situation'' is implicitly designated as ``getting your canoe through the falls'' (Suchman 1987). Surely the deck is stacked here. What about all the plotting and planning necessary to get away for the weekend, transport the canoe to the river, carry enough food, and so forth that must also be seen as definitive of the situation? It is only with the most mundane, plodding, and planful effort that one arrives ``at the falls.'' To circumscribe the ``situation'' as the glamorous, unpredictable moment of running the rapids is to miss the proverbial boat, as it were. An activity theory analysis instructs us to begin with the subjectively defined object as the point of analytical departure and thus will lead not simply to crystalline moments of improvisatory drama (whether measuring cottage cheese or running rapids) but to a more global view that encompasses the totality of an activity construed and constructed, in part, prior to its undertaking, with conscious, planful intent.

Holland and Reeves (this volume) studied the differing paths taken by three groups of student

programmers all enrolled in the same class and all beginning in the same ``situation.'' The professor gave each group the same specific task to accomplish during the semester and the students' ``performances were even monitored externally from an explicit and continually articulated position.'' The students were all supposed to be doing the same assignment; they heard the same lectures and had the same readings and resources. But as Holland and Reeves document, the projects took radically different courses and had extremely variable outcomes because the students themselves redefined the object of the class. Our understanding of what happened here must flow from an understanding of how each group of students construed, and reconstrued, the class situation. The ``situation'' by itself cannot account for the fact that one group of students produced a tool that was chosen for demonstration at a professional conference later in the year; one group produced a program with only twelve lines of code (and still got an A!); and the third group ``became so enmeshed in [interpersonal struggles] that the relationships among its members frequently became the object of its work.''

Bellamy (this volume) observes that to achieve practical results such as successfully introducing

technology into the classroom, it is necessary to understand and affect the objects of educators: ``to change the underlying educational philosophy of schools, designers must design technologies that support students' learning activities and design technologies that support the activities of educators and educational administrators. Only by understanding and designing for the complete situation of education ... will it be possible for technology to bring about pervasive educational reform.''

Situated action models make it difficult to go beyond the particularities of the immediate situation for purposes of generalization and comparison. One immerses in the minutiae of a particular situation, and while the description may feel fresh, vivid, and ``on-the-ground'' as one reads it, when a larger task such as comparison is attempted, it is difficult to carry the material over. One finds oneself in a claustrophobic thicket of descriptive detail, lacking concepts with which to compare and generalize. The lack of conceptual vocabulary, the appeal to the ``situation'' itself in its moment-by-moment details, do not lend themselves to higher-order scientific tasks where some abstraction is necessary.

It is appropriate to problematize notions of comparison and generalization in order to sharpen

comparisons and generalizations, but it is fruitless to dispense with these foundations of scientific thought. A pure and radically situated view would by definition render comparison and generalization as logically at odds with notions of emergence, contingency, improvisation, description based on in situ detail and point of view. (I am not saying any of the situated theorists cited here are this radical; I am playing out the logical conclusion of the ideas.) Difficult though it may be to compare and generalize when the subject matter is people, it is nonetheless important if we are to do more than simply write one self-contained descriptive account after another. The more precise, careful, and sensitive comparisons and generalizations are, the better. This is true not only from the point of view of science but also of technology design. Design, a practical activity, is going to proceed apace, and it is in our best interests to provide comparisons and generalizations based on nuanced and closely observed data, rather than rejecting the project of comparison and generalization altogether.

Holland and Reeves compare their study to Suchman's (1994) study, which centers on a detailed

description of how operations room personnel at an airport coordinated action to solve the problems of a dysfunctional ramp. Holland and Reeves point out that they themselves might have focused on a similar minutely observed episode such as studying how the student programmers produced time logs. However, they argue that they would then have missed the bigger picture of what the students were up to if they had, for example, concentrated on ``videotapes and transcriptions ... show[ing], the programmers' use of linguistic markers in concert with such items as physical copies of the time-log chart and the whiteboard xeroxes in order to orient joint attention, for example.''

Holland and Reeves's analysis argues for a basic theoretical orientation that accommodates a

longer time horizon than is typical of a ``situation.'' They considered the entire three-month semester as the interesting frame of reference for their analysis, while Suchman looked at a much shorter episode, more easily describable as a ``situation.'' (See also Suchman and Trigg 1993, where the analysis centers on an hour and a half of videotape.) Holland and Reeves's analysis relies heavily on long-term participant observation and verbal transcription; Suchman focuses on the videotape of a particular episode of the operations room in crisis. In comparing these two studies, we see how analytical perspective leads to a sense of what is interesting and determines where research effort is expended. Situated action models assume the primacy of a situation in which moment-by-moment interactions and events are critical, which leads away from a longer time frame of analysis. Videotape is a natural medium for this kind of analysis, and the tapes are looked at with an eye to the details of a particular interaction sequence (Jordan and Henderson 1994). By contrast, an activity theory analysis has larger scope for the kind of longer-term analysis provided by Holland and Reeves (though videotapes may certainly provide material of great interest to a particular activity theory analysis as in Bødker, this volume, and Engeström and Escalante, this volume).

Of course the observation that theory and method are always entangled is not new; Hegel (1966) discussed this problem. Engeström (1993) summarized Hegel's key point: ``Methods should be developed or `derived' from the substance, as one enters and penetrates deeper into the object of study.'' And Vygotsky (1978) wrote, ``The search for method becomes one of the most important problems of the entire enterprise of understanding the uniquely human forms of psychological activity. In this case, the method is simultaneously prerequisite and product, the tool and the result of the study.''

Situated action models, then, have two key problems: (1) they do not account very well for

observed regularities and durable, stable phenomena that span individual situations, and (2) they ignore the subjective. The first problem is partially addressed by situated action accounts that posit routines of one type or another (as discussed earlier). This brings situated action closer to activity theory in suggesting the importance of the historical continuity of artifacts and practice. It weakens true claims of ``situatedness'' which highlight the emergent, contingent aspects of action.

There has been a continuing aversion to incorporating the subjective in situated action models,

which have held fast in downplaying consciousness, intentionality, plans, motives, and prior knowledge as critical components of human thought and behavior (Suchman 1983, 1987; Lave 1988, 1993; Suchman and Trigg 1991; Lave and Wenger 1991; Jordan and Henderson 1994). This aversion appears to spring from the felt need to continue to defend against the overly rationalistic models of traditional cognitive science (see Cognitive Science 17, 1993 for the continuing debate) and the desire to encourage people to look at action in situ. While these are laudable motivations, it is possible to take them too far. It is severely limiting to ignore motive and consciousness in human activity and constricting to confine analyses to observable moment-by-moment interactions. Aiming for a broader, deeper account of what people are up to as activity unfolds over time and reaching for a way to incorporate subjective accounts of why people do what they and how prior knowledge shapes the experience of a given situation is the more satisfying path in the long run. Kaptelinin (chapter 5, this volume) notes that a fundamental question dictated by an activity theory analysis of human-computer interaction is: ``What are the objectives of computer use by the user and how are they related to the objectives of other people and the group/organization as a whole?'' This simple question leads to a different method of study and a different kind of result from a focus on a situation defined in its moment-by-moment particulars.

METHODOLOGICAL IMPLICATIONS OF ACTIVITY THEORY

To summarize the practical methodological implications for HCI studies of what we have been discussing in this section, we see that activity theory implies:

  1. A research time frame long enough to understand users' objects, including, where appropriate, changes in objects over time and their relation to the objects of others in the setting studied. Kuutti (this volume) observes that ``activities are longer-term formations and their objects cannot be transformed into outcomes at once, but through a process consisting often of several steps or phases.'' Holland and Reeves (this volume) document changing objects in their study of student programmers. Engeström and Escalante (this volume) trace changes in the objects of the designers of the Postal Buddy. Christiansen (this volume) shows how actions can become objectified, again a process of change over time.
  2. Attention to broad patterns of activity rather than narrow episodic fragments that fail to reveal the
  3. verall direction and import of an activity. The empirical studies in this book demonstrate the methods and tools useful for analyzing broad patterns of activity. Looking at smaller episodes can be useful, but not in isolation. Bødker (this volume) describes her video analysis of episodes of use of a computer artifact: ``Our ethnographic fieldwork was crucial to understanding the sessions in particular with respect to contextualization.'' 12 Engeström and Escalante apply the same approach.
  4. The use of a varied set of data collection techniques including interviews, observations, video, and historical materials, without undue reliance on any one method (such as video). Bødker, Christiansen, Engeström and Escalante, and Holland and Reeves (this volume) show the utility of historical data (see also McGrath 1990; Engeström 1993).
  5. A commitment to understanding things from users' points of view, as in, for example, Holland and Reeves (this volume). Bellamy (this volume) underscores the practical need for getting the ``natives''' point of view in her study of technology in the classroom. For purposes of technology design, then, these four methodological considerations suggest a phased

Human-computer interaction studies are a long way from the ideal set out by Brooks (1991): a corpus of knowledge that identifies the properties of artifacts and situations that are most significant for design and which permits comparison over domains, generates high-level analyses, and suggests actual designs. However, with a concerted effort by researchers to apply a systematic conceptual framework encompassing the full context in which people and technology come together, much progress can be made. A creative synthesis of activity theory as a backbone for analysis, leavened by the focus on representations of distributed cognition, and the commitment to grappling with the perplexing flux of everyday activity of the situated action perspective, would seem a likely path to success.

ACKNOWLEDGMENTS

My grateful thanks to Rachel Bellamy, Lucy Berlin, Danielle Fafchamps, Vicki O'Day, and Jenny Watts for stimulating discussions of the problems of studying context. Kari Kuutti provided valuable commentary on an earlier draft of the chapter. Errors and omissions are my own.

NOTES

  1. This chapter is an expanded version of the paper that appeared in Proceedings East-West HCI Conference (pp. 352– 359), St. Petersburg, Russia. August 4–8, 1992, used with permission of the publisher.
  2. I concentrate here on what Salomon (1993) calls the ``radical'' view of situatedness, to explore the most fundamental differences among the three perspectives.
  3. Lave (1988) actually argues for the importance of institutions, but her analysis does not pay much attention to them, focusing instead on fine-grained descriptions of the particular activities of particular individuals in particular settings.
  4. Weight Watchers is an organization that helps people lose weight. Dieters must weigh and measure their food to ensure that they will lose weight by carefully controlling their intake.
  5. The word goal in everyday English usage is generally something like what activity theorists call an object in that it connotes a higher-level motive.
  6. Suchman (1987) also says that plans may be ``projective accounts'' of action (as well as retrospective reconstructions), but it is not clear what the difference is between a conventional notion of plan and a ``projective account.''
  7. Rhetorically, the behavioristic cast of situated action descriptions is reflected in the use of impersonal referents to name study participants when reporting discourse. For example, study participants are referred to as ``Shopper'' in conversational exchanges with the anthropologist in Lave (1988), or become ciphers, e.g., A, B (Suchman 1987), or initials denoting the work role of interest, such as ``BP'' for baggage planner (Suchman and Trigg 1991). The use of pseudonyms to suggest actual people would be more common in a typical ethnography.
  8. A good overview of the use of video for ``interaction analysis'' in which moment-by-moment interactions are the focus of study is provided by Jordan and Henderson (1994). They posit that understanding what someone ``might be thinking or intending'' must rely on ``evidence ... such as errors in verbal production or certain gestures and movements'' (emphasis in original). The ``evidence'' is not a verbal report by the study participant; it must be something visible on the tape—an observable behavior such as a verbal mistake. Jordan and Henderson observe that intentions, motivations and so forth ``can be talked about only by reference to evidence on the tape'' (emphasis in original). The evidence, judging by all their examples, does not include the content of what someone might say on the tape but only ``reactions,'' to use their word, actually seen on the tape.

This is indeed a radical view of research. Does it mean that all experimental and naturalistic study in which

someone is said to think or intend that has heretofore been undertaken and for which there are no video records does not have any ``evidence''? Does it mean that a researcher who has access only to the tapes has as good an idea of what study participants are up to as someone who has done lengthy participant-observation? The answers would appear to be yes since the ``evidence'' is, supposedly, encased in the tapes. In the laboratory where Jordan and Henderson work, the tapes are indeed analyzed by researchers who have not interacted personally with the study participants (Jordan and Henderson 1994). While certainly a great deal can be learned this way, it would also seem to limit the scope and richness of analysis. Much of interest happens outside the range of a video camera. The highly interpretive nature of video analysis has not been acknowledged by its supporters. The method is relatively new and in the first flush of enthusiastic embrace. Critiques will follow; they are being developed by various researchers taking a hard look at video.

Jordan and Henderson do invite study participants into the lab to view the tapes and comment on them. This

seems like a very interesting and fruitful idea. However, their philosophy is to try to steer informats toward their own epistemology—that is, that what is on the video is reality—not some other subjective reality the study participants might live with. As Jordan and Henderson (1994) say, ``elicitation'' based on viewing tapes ``has the advantage of staying much closer to the actual events [than conventional interviews]'' (emphasis added).

9. Rogers and Ellis (1994) make this same argument but for distributed cognition. However they do not consider activity theory.

10. Many bird watchers keep ``life lists'' in which they write down every individual bird species they have ever seen. They may want to see all the North American birds, or all the birds of Europe, or some other group of interest to them. 11. I use the term subjective to mean ``emanating from a subject'' (in activity theory terms), not ``lacking in objectivity'' in the sense of detachment, especially scientific detachment (a common meaning in English).

12. While Jordan and Henderson state that participant-observation is part of their method in interaction analysis, they use participant-observation to ``identify interactional `hot spots'—sites of activity for which videotaping promises to be productive'' (Jordan and Henderson 1994). Participant observation is used as a heuristic for getting at something very specific—interactions—and further, those particular interactions that will presumably be interesting on tape. In a sense, interaction analysis turns participant-observation on its head by selectively seeking events that will lend themselves to the use of a particular technology— video—rather than using video if and when a deeper understanding of some aspect of a culture is revealed in the process of getting to know the natives in their own terms, as in classic participant observation. Note that Bødker (this volume) pairs ethnographic fieldwork with video to provide for contextualization; she thus uses ethnography to add to what can be seen on the tape, while Jordan and Henderson use it to pare down what will appear on the tape and thus what will be analyzed as ``evidence.''

REFERENCES Ashby, W. R. (1956). Introduction to Cybernetics. London: Chapman and Hall. Bertalanffy, L. (1968). General System Theory. New York: George Braziller. Bertelsen, O. (1994). Fitts' law as a design artifact: A paradigm case of theory in software design. In Proceedings East West Human Computer Interaction Conference (vol. 1, pp. 37–43). St. Petersburg, Russia, August 2–6.

Bødker, S. (1989). A human activity approach to user interfaces. Human-Computer Interaction 4:171– 195. Brooks, R. (1991). Comparative task analysis: An alternative direction for human-computer interaction science. In J. Carroll, ed., Designing Interaction: Psychology at the Human Computer Interface. Cambridge: Cambridge University Press.

Chaiklin, S., and Lave, J. (1993). Understanding Practice: Perspectives on Activity and Context. Cambridge: Cambridge University Press.

Clement, A. (1990). Cooperative support for computer work: A social perspective on the empowering of end users. In Proceedings of CSCW'90 (pp. 223–236). Los Angeles, October 7–10.

Davydov, V., Zinchenko, V., and Talyzina, N. (1982). The problem of activity in the works of A. N. Leont'ev. Soviet Psychology 21:31–42.

Engeström, Y. (1990). Activity theory and individual and social transformation. Opening address at 2d International Congress for Research on Activity Theory, Lahti, Finland, May 21–25.

Engeström, Y. (1993). Developmental studies of work as a testbench of activity theory. In S. Chaiklin and J. Lave, Understanding Practice: Perspectives on Activity and Context (pp. 64–103). Cambridge: Cambridge University Press. Fafchamps, D. (1991). Ethnographic workflow analysis. In H.-J. Bullinger, eds., Human Aspects in Computing: Design and Use of Interactive Systems and Work with Terminals (pp. 709–715). Amsterdam: Elsevier Science Publishers. Flor, N., and Hutchins, E. (1991). Analyzing distributed cognition in software teams: A case study of team programming during perfective software maintenance. In J. Koenemann-Belliveau et al., eds., Proceedings of the Fourth Annual Workshop on Empirical Studies of Programmers (pp. 36–59). Norwood, N.J.: Ablex Publishing. Gantt, M., and Nardi, B. (1992). Gardeners and gurus: Patterns of cooperation among CAD users. In Proceedings CHI '92 (pp. 107–118). Monterey, California, May 3–7.

Garfinkel, H. (1967). Studies in Ethnomethodology. Englewood Cliffs, NJ: Prentice-Hall.

Goodwin, C., and Goodwin, M. (1993). Seeing as situated activity: Formulating planes. In Y. Engeström and D. Middleton, eds., Cognition and Communication at Work. Cambridge: Cambridge University Press.

Hegel, G. (1966). The Phenomenology of Mind. London: George Allen & Unwin.

Hutchins, E. (1987). Metaphors for interface design. ICS Report 8703. La Jolla: University of California, San Diego.

Hutchins, E. (1990). The technology of team navigation. In J. Galegher, ed., Intellectual Teamwork. Hillsdale, NJ: Lawrence Erlbaum.

Hutchins, E. (1991a). How a cockpit remembers its speeds. Ms. La Jolla: University of California, Department of Cognitive Science.

Hutchins, E. (1991b). The social organization of distributed cognition. In L. Resnick, ed., Perspectives on Socially Shared Cognition (pp. 283–287). Washington, DC: American Psychological Association.

Hutchins, E. (1994). Cognition in the Wild. Cambridge, MA: MIT Press.

Kozulin, A. (1986). The concept of activity in Soviet psychology. American Psychologist 41(3):264–274.

Kuutti, K. (1991). Activity theory and its applications to information systems research and development. In H.-E. Nissen, ed., Information Systems Research (pp. 529–549). Amsterdam: Elsevier Science Publishers.

Jordan, B., and Henderson, A. (1994). Interaction analysis: Foundations and practice. IRL Technical Report. Palo Alto, IRL.

Lave, J. (1988). Cognition in Practice. Cambridge: Cambridge University Press.

Lave, J. (1993). The practice of learning. In S. Chaiklin and J. Lave, eds., Understanding Practice: Perspectives on Activity and Context. Cambridge: Cambridge University Press.

Lave, J., and Wenger, I. (1991). Situated Learning: Legitimate Peripheral Participation. Cambridge: Cambridge University Press.

Leont'ev, A. (1974). The problem of activity in psychology. Soviet Psychology 13(2):4–33.

Leont'ev, A. (1978). Activity, Consciousness, and Personality. Englewood Cliffs, NJ: Prentice-Hall.

Luria, A. R. (1979). The Making of Mind: A Personal Account of Soviet Psychology. Cambridge, MA: Harvard University Press.

McGrath, J. (1990). Time matters in groups. In J. Galeher, R. Kraut, and C. Egido, eds., Intellectual Teamwork: Social and Technological Foundations of Cooperative Work (pp. 23–61). Hillsdale, NJ: Lawrence Erlbaum.

Mackay, W. (1990). Patterns of sharing customizable software. In Proceedings CSCW'90 (pp. 209–221). Los Angeles, October 7–10.

MacLean, A., Carter, K., Lovstrand, L., and Moran, T. (1990). User-tailorable systems: Pressing the issues with buttons. In Proceedings, CHI'90 (pp. 175–182). Seattle, April 1–5.

Nardi, B. (1993). A Small Matter of Programming: Perspectives on End User Computing. Cambridge: MIT Press.

Nardi, B., and Miller, J. (1990). The spreadsheet interface: A basis for end user programming. In Proceedings of Interact'90 (pp. 977–983). Cambridge, England, August 27–31.

Nardi, B., and Miller, J. (1991). Twinkling lights and nested loops: Distributed problem solving and spreadsheet development. International Journal of Man-Machine Studies 34:161–184.

Nardi, B., and Zarmer, C. (1993). Beyond models and metaphors: Visual formalisms in user interface design. Journal

of Visual Languages and Computing 4:5–33.

Nardi, B., Schwarz, H., Kuchinsky, A., Leichner, R., Whittaker, S., and Sclabassi, R. (1993). Turning away from talking heads: The use of video-as-data in neurosurgery. In Proceedings INTERCHI'93 (pp. 327–334). Amsterdam, April 24–28.

Newell, A., and Simon, H. (1972). Human Problem Solving. Englewood Cliffs, NJ: Prentice-Hall.

Newman, D., Griffin, P., and Cole, M. (1989). The Construction Zone: Working for Cognitive Change in School. Cambridge: Cambridge University Press.

Norman, D. (1988). The Psychology of Everyday Things. New York: Basic Books.

Norman, D. (1991). Cognitive artifacts. In J. Carroll, ed., Designing Interaction: Psychology at the Human Computer Interface. New York: Cambridge University Press.

Norman, D., and Hutchins, E. (1988). Computation via direct manipulation. Final Report to Office of Naval Research, Contract No. N00014-85-C-0133. La Jolla: University of California, San Diego.

Petre, M., and Green, T. R. G. (1992). Requirements of graphical notations for professional users: Electronics CAD systems as a case study. Le Travail humain 55:47–70.

Raeithel, A. (1991). Semiotic self-regulation and work: An activity theoretical foundation for design. In R. Floyd et al., eds., Software Development and Reality Construction. Berlin: Springer Verlag.

Rogers, Y., and Ellis, J. (1994). Distributed cognition: An alternative framework for analysing and explaining collaborative working. Journal of Information Technology 9:119–128.

Salomon, G. (1993). Distributed Cognitions: Psychological and Educational Considerations. Cambridge: Cambridge University Press.

Scribner, S. (1984). Studying working intelligence. In B. Rogoff and J. Lave, eds., Everyday Cognition: Its Development in Social Context. Cambridge, MA: Harvard University Press.

Seifert, C. and Hutchins, E. (1988). Learning from error. Education Report Number AD-A199. Washington, DC: American Society for Engineering.

Suchman, L. (1987). Plans and Situated Actions. Cambridge: Cambridge University Press.

Suchman, L. (1993). Response to Vera and Simon's situated action: A symbolic interpretation. Cognitive Science 1:71– 76.

Suchman, L. (1994). Constituting shared workspaces. In Y. Engeström and D. Middleton, eds., Cognition and Communication at Work. Cambridge: Cambridge University Press.

Suchman, L., and Trigg, R. (1991). Understanding practice: Video as a medium for reflection and design. In J. Greenbaum and M. Kyng, eds., Design at Work: Cooperative Design of Computer Systems. Hillsdale, NJ: Lawrence Erlbaum.

Suchman L., and Trigg, R. (1993). Artificial intelligence as craftwork. In S. Chaiklin and J. Lave, eds., Understanding Practice: Perspectives on Activity and Context. Cambridge: Cambridge University Press.

Symon, G., Long, K., Ellis, J., and Hughes, S. (1993). Information sharing and communication in conducting radiological examinations. Technical report. Cardiff, UK: Psychology Department, Cardiff University.

Tikhomirov, O. (1972). The psychological consequences of computerization. In O. Tikhomirov, ed., Man and Computer. Moscow: Moscow University Press.

Vygotsky, L. S. (1978). Mind in Society. Cambridge, MA: Harvard University Press.

Wertsch, J. (ed.). (1981). The Concept of Activity in Soviet Psychology. Armonk, NY: M. E. Sharpe.

Wiener, N. (1948). Cybernetics. New York: Wiley.

Winograd, T. and Flores, F. (1986). Understanding Computers and Cognition: A New Foundation for Design. Norwood, NJ: Ablex.

Zhang, J. (1990). The interaction of internal and external information in a problem solving task. UCSD Technical Report 9005. La Jolla: University of California, Department of Cognitive Science.